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Input Data Processing Techniques in Intrusion 
Detection Systems - Short Review 

Suhair H. Amer, and John A. Hamilton, Jr. 



Abstract - In this paper intrusion detection systems (IDSs) are 
classified according to the techniques applied to processing 
input data. This process is complex because IDSs are highly 
coupled in actual implemented systems. Eleven input data 
processing techniques associated with intrusion detection 
systems are identified. They are then grouped into more 
abstract categories. Some approaches are artificially intelligent 
such as neural networks, expert systems, and agents. Others 
are computationally based such as Bayesian networks, and 
fuzzy logic. Finally, some are based on biological concepts such 
as immune systems and genetics. Characteristics of and 
systems employing each technique are also mentioned. 

i. Introduction 

W hen traditionally classifying intrusion detection systems 
(IDSs) as misuse, anomaly or hybrid, the systems are 
grouped according to the technique they utilize to detect 
intrusions. For example, misuse-based IDSs match already 
stored attack signatures against the audit data gathered while 
the monitored system is or was running. In anomaly based 
IDSs, detection utilize models of normal behavior where any 
deviation from such behavior is identified as an intrusion. 
Another type of traditional classification is categorizing an 
IDS according to its setup as network-based, host-based or 
hybrid. Network based systems monitor network activities 
whereas a host based system monitor the activities of a 
single system for intrusion traces [1]. In general, IDSs may 
apply many techniques to detect intrusions and improve 
detection such as neural networks, expert systems, agents, 
Bayesian networks, fuzzy logic, immune systems and 
genetics. Little attention has been given to classifying the 
processing techniques applied on the input data provided to 
the IDS. In this paper we classify input data processing 
techniques utilized with IDSs that may use and may not use 
the same processing technique to detect intrusions. In 
section 2, abstract classification of the different input data 
processing techniques utilized with IDSs will be presented. 
Eleven input data processing techniques associated with 
IDSs are identified. Then they are grouped into more 
abstract categories. In section 3, a general description as 
well as some advantages and disadvantages of each 
technique and examples of system employing these 
techniques will be presented. 
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II. Classification of Input Data processing 
Techniques in IDSs 

In this paper we are concerned with the techniques used 
to process input data that is considered when designing and 
implementing IDSs. Classifying such techniques are not 
easy because in the actual implemented system, combination 
of techniques may be used. However, identifying them 
individually helps better understand the merits and 
limitations of each, and how to improve a techniques 
performance by using another. Eleven techniques are 
identified [shown at the lower level of diagram 1] that are 
widely and currently used for processing input data of IDSs. 
They are then grouped into more abstract categories that are 
identified at the upper levels of diagram 1 . This is important 
because the characteristics of each technique are highly 
affected by the category (ies) that it belongs to. In the lower 
level of Fig. 1, techniques such as Agents and Data Mining 
belong to the Intelligent Data Analysis category. This is 
indicated by the dotted relation between Data Analysis and 
AI categories. The techniques: Expert systems and Fuzzy 
logic are intelligent model-based-rule-based systems shown 
by the dotted relation between Rule based and AI categories 
in Fig. 1. Next is an explanation of each item in Fig. 1, 
along with some identified characteristics. 




Fig.l. Data processing techniques applied on input data processed by 
Intrusion Detection Systems. 
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A. Rule Based 

If a rule-based IDS is to use input data or audit data, such 
information will be in a codified rules format of known 
intrusions. The input data will represent identified intrusive 
behavior and categorizing intrusion attempts by sequences 
of user activities that lead to compromised system states. 
The IDS will take as input the predefined rules as well as the 
current audit data and check if a rule is fired. In general, 
using rule bases are affected by system hardware or software 
changes and require updates by system experts as the system 
is enhanced or maintained. Such input data technique is very 
useful in an environment where physical protection of the 
computer system is not always possible (e.g., a battlefield 
situation) but require strong protection 
[http://www. sei.cmu.edu/str/descriptions/rbid.html] . 

In general, rule based systems can be: 

1. State-based: in the audit trails, intrusion attempts 

are defined as sequences of system states leading 
from an initial state to a final compromised state 
represented in a state transition diagram. The two 
inputs to the IDS will include the audit trail and the 
state transition diagrams of known penetrations that 
will be compared against each other using an 
analysis tool. One advantage of using state based 
representation of data is that it is independent of the 
audit trail record and is capable of detecting 
cooperative attacks and attacks that span across 
multiple user sessions. However, some attacks 
cannot be detected because they cannot be modeled 
with state transitions 

[http://www.sei.cmu.edu/str/descriptions/rbid.html] 

2. Model-based: intrusion attempts in input data can 

be modeled as sequences of user behavior. This 
approach allows the processing of more data, 
provide more intuitive explanations of intrusion 
attempts and predict intruder's next action. More 
general representation of penetrations can be 
generated since intrusions are modeled at a higher 
level of abstraction. However, if an attack pattern 
does not occur in the appropriate behavior model it 
cannot be detected 

[http://www.sei.cmu.edu/str/descriptions/rbid.html] 

B. Artificial Intelligence (AI) 

AI improves algorithms by employing problem solving 
techniques used by human beings such as learning, training 
and reasoning. One of the challenges of using AI techniques 
is that it requires a large amount of audit data in order to 
compute the profile rule or pattern sets. From the audit 
trails, information about the system is extracted and patterns 
describing the system are generated. In general, AI can be 
employed in two ways: (1) Evolutionary methods 

(Biologically driven) are mechanisms inspired by biological 
evolution, such as reproduction, mutation and 
recombination. (2) Machine learning is concerned with the 
design and development of algorithms and techniques that 
allow the learning of computers. The major focus of 



machine learning research is to extract information from 
data automatically [2] . 

C. Data Analysis 

With data analysis, data is transformed in order to extract 
useful information and reach conclusions. It is usually used 
to approve or disapprove an existing model, or to extract 
parameters necessary to adapt a theoretical model to an 
experimental one. Intelligent data analysis indicates that the 
application is performing some analysis associated with user 
interaction and then provides some insights that are not 
obvious. One of the problems faced when applying such an 
approach is that most application logs (input information) do 
not conform to a specific standard. Analysis of logs should 
be performed to find commonalities and different types of 
logs should be grouped. Another problem is the existence of 
noise, missing values and inconsistent data in the actual log 
information. Attackers may take advantage of the fact that 
logs may not record all information and therefore exploit 
this point. Finally, real world data sets tend to be too large 
and multidimensional which requires data cleaning and data 
reduction [3]. 

D. Computational Methods 

Computational intelligence research aims to use learning, 
adaptive, or evolutionary algorithms to create programs. 
These algorithms allow the systems to operate in real time 
and detect system faults quickly. However, there are costs 
associated with creating audit trails and maintaining input 
user profiles as well as some risks. For example, because 
user profiles are updated periodically, it is possible to accept 
a new user behavior pattern where an attack can be safely 
mounted. This is why it is difficult sometimes to define user 
profiles especially if they have inconsistent work habits. In 
general, there are two types of IDSs that utilize a 
computational method: (1) Statistics -based IDS are 

employed to identify audit data that may potentially indicate 
intrusive behavior. These systems analyze input audit trail 
data by comparing them to normal behavior to find security 
violations. (2) Heuristics-based IDS which can be a function 
that estimates the cost of the cheapest path from one node to 
another [http://www.sei.cmu.edu/str/descriptions/sbid.html] . 

hi. Capabilities and Examples of Processing 
Techniques of Input Data used by IDSs 

Because some IDS data processing techniques are closely 
interacting and similar, classifying them is complex. 
However, we believe that the identified eleven categories 
capture most of the well known types. For example, from 
Fig. 1, although expert systems and fuzzy logic belong to 
the categories AI and rule based they have distinguishing 
characteristics and usages. The output of the expert system 
is specific; the data that is used to build the system is 
complete, and the set of rules are well defined. As for fuzzy 
logic, it is usually used in systems where the output is not 
well defined and is continuous between 0 andl. 
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A. Bayesian networks 

Bayesian networks are used when we want to describe the 
conditional probability of a set of possible causes for a given 
observed event that are computed from the probability of 
each cause and the conditional probability of the outcome of 
each cause. They are suitable for extracting complex 
patterns from sizable amounts of input information that can 
also contain significant levels of noise. Several systems 
have been developed using Bayesian network concepts. In 
the following system, Scott’s [4] IDS is based on stochastic 
models of user and intruder behavior combined using 
Bayes’ theorem which mitigates the complexity of network 
transactions that have complicated distributions. Intrusion 
probabilities can be calculated and dynamic graphics are 
used to allow investigators to use the evidence to navigate 
around the system. 

B. Neural networks 

Training Neural networks enable them to modify a state of a 
system by discriminating between classes of inputs. They 
also learn about the relationship between input and output 
vectors and generalize them to extract new input and output 
relationships. They are suitable when identification and 
classification of network activities are based on incomplete 
and limited input data sources. They are able to process 
data from a number of sources, accept nonlinear signals as 
input and need a large sample size of input information. 
Finally, neural networks are not suitable when the 
information is imprecise or vague and it is unable to 
combine numeric data with linguistic or logical data. In the 
following system, Bivens et al. [5] employed the time- 
window method for detection and were able to recognize 
long multi-packet attacks. They were able to identify 
aggregate trends in the network traffic in the preprocessing 
step by looking only at three packet characteristics. Once 
the system is trained and by using the input data, the neural 
network was able to perform real-time detection. 

C. Data mining 

Data mining refers to a set of techniques that extracts 
previously unknown but potentially useful data from large 
stores system logs. One of the fundamental data mining 
techniques used in intrusion detection is associated with 
decision trees [6] that detect anomalies in large databases. 
Another technique uses segmentation where patterns of 
unknown attacks are extracted from a simple audit and then 
matched with previously warehoused unknown attacks [7]. 
Another data mining technique is associated with finding 
association rules by extracting previously unknown 
knowledge on new attacks and building normal behavior 
patterns [8]. Data mining techniques allows finding 
regularities and irregularities in large input data sets. 
However, they are memory intensive and require double 
storage: one for the normal IDS data and another for the data 
mining. The system of Lee, Solto and Mok’s [7] was able 
to detect anomalies using predefined rules; however, it 
needed a supervisor to update the system with the 
appropriate rules of certain attacks. The rule generation 
methodology developed, first defines an association rule that 



identifies the relation between rules and specifies the 
confidence for the rule. 

D. Agents 

Agents are self contained processes that can perceive their 
environment through sensors and act on the environment 
through effectors. Agents trace intruders and collect input 
information that is related only to the intrusion along the 
intrusion route and then decide if an intrusion has occurred 
from target systems across the network. One of the major 
disadvantages associated with agents is that it needs a highly 
secure agent execution environment while collecting and 
processing input information. It is difficult also to 
propagate agent execution environments onto large numbers 
of third-party servers. Several systems have been developed 
utilizing agents. Spafford and Zamboni [9] introduced 
Autonomous Agents for Intrusion Detection (AAFID) using 
autonomous agents for performing intrusion detection. Their 
prototype provides a useful framework for the research and 
testing of intrusion detection algorithms and mechanisms. 
Gowadia, Farkas and Valtorta [10] implemented a 
Probabilistic Agent-Based Intrusion Detection (PAID) 
system that has cooperative agent architecture. In their 
model agents are allowed to share their beliefs and perform 
updates. Agent graphs are used to represent intrusion 
scenarios. Each agent is associated with a set of input, 
output, and local variables. 

E. Immune based 

Immune based IDS are developed based on human immune 
system concepts and can perform tasks similar to innate and 
adaptive immunity. In general, audit data representing the 
appropriate behavior of services are collected and then a 
profile of normal behavior is generated. One challenge 
faced is to differentiate between self and non-self data which 
when trying to control causes scaling problems and the 
existence of holes in detector sets. 

There have been several attempts to implement immunity- 
based systems. Some have experimented with innate 
immunity which is the first line of defense in the immune 
system and is able to detect known attacks. For example, 
Twycorss and Aickelin [11] implemented libtissue that uses 
a client/server architecture acting as an interface for a 
problem using immune based techniques. Pagnoni and 
Visconti [12] implemented a native artificial immune system 
(NAIS) that protects computer networks. Their system was 
able to discriminate between normal and abnormal 
processes, detect and protect against new and unknown 
attacks and accordingly deny access of foreign processes to 
the server. For adaptive immunity two approaches have 
been studied: negative selection and danger theory concepts. 
Kim and Bentley [13] implemented a dynamic clonal 
selection algorithm that employs negative selection by 
comparing immature detectors to a given antigen set. 
Immature detectors that bind to an antigen are deleted and 
the remaining detectors are added to the accepted 
population. If a memory detector matches an antigen an 
alarm is raised. A recent approach to implement adaptive 
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immunity uses the danger theory concept [14]. Danger 
theory suggests that an immune response reacts to danger 
signals resulting from damage happening to the cell and not 
only for being foreign or non-self to the body. 

F. Genetic algorithms 

Genetic algorithms are a family of problem-solving 
techniques based on evolution and natural selection. 
Potential solutions to the problem to be solved are encoded 
as sequences of bits, characters or numbers. The unit of 
encoding is called a gene, and the encoded sequence is 
called a chromosome. The genetic algorithm begins with 
chromosomes population and an evaluation function that 
measures the fitness of each chromosome. Finally, the 
algorithm uses reproduction and mutation to create new 
solutions. In the system of Shon and Moon [15] the 
Enhanced Support Vector Machine (Enhanced SVM) 
provides unsupervised learning and low false alarm 
capabilities. Profile of normal packets is created without 
preexisting knowledge. After filtering the packets they use a 
genetic algorithm for extracting optimized information from 
raw internet packets. The flow of packets that is based on 
temporal relationships during data preprocessing is used in 
the SVM learning. 

G. Fuzzy logic 

Fuzzy logic is a system of logic that mimics human decision 
making and deals with the concept of partial truth and in 
which the rules can be expressed imprecisely. Several 
systems have been developed using fuzzy logic. Abrahama 
et al. [16] modeled Distributed Soft Computing-based IDS 
(D-SCIDS) as a combination of different classifiers to 
model lightweight and heavy weight IDSs. Their empirical 
results show that a soft computing approach could play a 
major role for intrusion detection where the fuzzy classifier 
gave 100% accuracy for all attack types using all used 
attributes. Abadeh, Habibi and Lucas [17] describe a fuzzy 
genetics -based learning algorithm and discuss its usage to 
detect intrusion in a computer network. They suggested a 
new fitness function that is capable of producing more 
effective fuzzy rules that also increased the detection rate as 
well as false alarms. Finally, they suggested combining two 
different fitness function methods in a single classifier, to 
use the advantages of both fitness functions concurrently. 

H. Expert systems 

Expert systems-based IDSs build statistical profiles of 
entities such as users, workstations and application 
programs and use statically unusual behavior to detect 
intruders. They work on a previously defined set of rules 
that represent a sequence of actions describing an attack. 
With expert systems, all security related events that are 
incorporated in an audit trail are translated in terms of if- 
then-else rules. The expert system can also hold and 
maintain significant levels of information. However, the 
acquisition of rules from the input data is a tedious and is an 
error-prone process. The system of Ilgun, Kemmerer and 
Porras [18], is an approach to detect intrusions in real time 
based on state transition analysis. The model is represented 



as a series of state changes that lead from an initial secure 
state to a target compromised state. The authors developed 
USTAT which is a UNIX specific prototype of a state 
transition analysis tool (ST AT) which is a rule based expert 
system that is fed with the diagrams. In general, ST AT 
extracts and compares the state transition information 
recorded within the target system audit trails to a rule based 
representation of known attacks that is specific to the 
system. 

I. Signature analysis or Pattern Matching 

In this approach the semantic description of an attack is 
transformed into the appropriate audit trail format 
representing an attack signature. An attack scenario can be 
described, for example, as a sequence of audit events that a 
given attack generates. Detection is accomplished by using 
text string matching mechanisms. Human expertise is 
required to identify and extract non conflicting elements or 
patterns from input data. The system of Kumar’s [19] is 
based on the complexity of matching. Based on the desired 
accuracy of detection, he developed a classification to 
represent intrusion signatures and used different encodings 
of the same security vulnerability. His pattern specification 
incorporated several abstract requirements to represent the 
full range and generality of intrusion scenarios that are: 
context representation, follows semantics, specification of 
actions and representation of invariants. 

J. State machines 

State machines model behavior as a collection of states, 
transitions and actions. An attack is described with a set of 
goals and transitions that must be achieved by an intruder to 
compromise a system. Several systems have been developed 
using this technique. Sekar et al. [20] employ state-machine 
specifications of network protocols that are augmented with 
information about statistics that need to be maintained to 
detect anomalies. The protocol specifications simplified the 
manual feature selection process used in other anomaly 
detection approaches. The specification language made it 
easy to apply their approach to other layers such as HTTP 
and ARP protocols. Peng, Leckie and Ramamohanarao [20] 
proposed a framework for distributed detection systems. 
They improved the efficiency of their system by using a 
heuristic to initialize the broadcast threshold and 
hierarchical system architecture. They have presented a 
scheme to detect the abnormal packets caused by the 
reflector attack by analyzing the inherent features of the 
reflector attack. 

K. Petri nets 

The Colored Petri Nets are used to specify control flow in 
asynchronous concurrent systems. It graphically depicts the 
structure of a distributed system as a directed bipartite graph 
with annotations. It has place nodes, transition nodes and 
directed arcs connecting places with transitions. In the 
system of Srinivasan and Vaidehi [22] a general model 
based on timed colored Petri net is presented that is capable 
of handling patterns generated to model the attack behavior 
as sequence of events. This model also allows flagging an 
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attack, when the behavior of one or more processes matches 
the attack behavior. Their use of a graphical representation 
of a timed colored Petri net gives a straightforward view of 
relations between attacks. 

IV. Conclusion 

Choosing an IDS to be deployed in an environment would 
seem to be simple, however, with the different components, 
types and classifications such a decision is quite complex. 
There have been many attempts to classify IDSs as a mean 
to facilitate choosing better solutions. In this paper we 
classified IDSs according to the data processing techniques 
applied to input information. Careful design of an IDS may 
allow correct implementation of an IDS. However, the 
actual merits and limitations of each approach, which is also 
discussed in this paper, indicate that obtaining complete 
security and different desirable system characteristics can 
not be achieved by employing only one type of an 
implementation approach. The data processing techniques 
were grouped into general (abstract) categories and were 
then further expanded into eleven more specialized 
techniques. 

We discussed and summarized the characteristics of each 
technique followed by examples of developed systems using 
each technique. Fig. 1, for example, helps us understand that 
we can use the state machine technique to build an IDS, and 
that we can add intelligence to it and use the expert system 
technique with added merits and costs. The merits are the 
ability to perform and provide intelligent actions and 
answers. Unrealistic actions or answers can be refuted or 
ignored. It also borrows from statistics the ability to detect 
intrusions without prior information about the security flaws 
of a system. Some of the incurred costs are the conflicting 
requirement of maintaining high volume of data which 
affects throughput and selecting the appropriate thresholds 
that lower false positive and negatives. To conclude, 
selecting the appropriate technique should be carried out 
carefully. Each organization should state prior to 
development the requirements of its agency and the 
acceptable costs. Accordingly, the selected system should be 
able to incorporate most of the requirements, as complete 
security can not be achieved. 
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AfofracT-Nowadays research and development activities are 
accompanied by an increasing focus on future user needs in the 
field of multimedia retrieval. The fast growing of multimedia 
data repositories is an undeniable fact, so specialized tools 
allowing storage, indexing and retrieval of multimedia content 
have to be developed, and in addition easy-to-use content 
exchange is needed. The transition from text to photo retrieval 
raises the necessity of generating, storing and visualizing 
additional meta-information about the content to allow 
semantic retrieval. “NWCBIR”, a prototype allowing semantic 
annotation of digital photos based on MPEG-7 standards [4], is 
presented as a possible new way of handling semantics in 
descriptions of multimedia data. 

Keywords : Semantic annotation, MPEG-7, NWCBIR. 

I. Introduction 

T he evolution of digital information repositories produce 
more and more specialized requirements towards 
intelligent information retrieval. Numerous research and 
development teams are doing fundamental research 
concerning various unforeseen topics like managing more 
than 300 TV channels with a remote control without losing 
orientation. Base for interdisciplinary future developments 
are overall agreed standards and standardized methods. 
Using the following scenario we examined the possibilities 
current technologies like MPEG-7 [3], bring us in context of 
one real world problem. 

Digital camera users produce a lot of images throughout the 
year and save them to personal computers. After some time 
the amount of photos exceeds the critical mass for being 
manageable without specialized tools. Most people create 
an intuitive structure for storing their personal image library. 
They create folders for images that are taken in the same 
context, for example “Photos from LC Convention Meet 
June 2008” or “Spiritual Tour Photos”. Nevertheless this 
does not enable the user to find a photo which shows a 
specific person, object or even expresses a specific idea or 
feeling when needed. Some file formats like TIFF and 
JPEG permit the user to enrich the visual information with 
structured textual descriptions, but they only offer limited 
retrieval capabilities. MPEG-7 offers a whole range of 
descriptors to annotate images with manually or 
automatically generated metadata 
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[2]. The picture can be described in many different ways 
regarding for example its quality, its technical attributes, its 
instances (thumbnails, high resolution, and so on) and its 
content from either a technical or a semantic point of view. 
The prototype, NWCBIR system allows annotating digital 
photos manually and extracts content based on low level 
features from the image automatically. 

II. Existing metadata standards for describing 

MULTIMEDIA 

The standard being used to define the way of handling the 
metadata has to be a lot more powerful than EXIF or for 
instance Dublin Core [5]. DC only defines 15 core 
qualifiers, which can be understood as metadata tags, which 
can be filled by the user. A combination of Dublin Core and 
adapted Resource Description Framework structures, RDF, 
would at least permit a structured storage of graphs and a 
quality rating, although content based image retrieval would 
not be supported. An import of the EXIF information to a 
RDF- based structure is possible. The main proposition 
against RDF is that there exists, at this time, no standardized 
structure for saving all or most of the metadata defined in 
the requirements above. Although it would not prove 
impossible to create such a structure, to gain interoperability 
with other systems and implementations, agreeing on the 
same RDF based enhancements with all other developers or 
vendors is necessary. Based on these facts a much better 
choice is MPEG-7 [1]. 

III. Realization of NWCBIR Annotation Tool 

As MPEG-7 is a complex XML based standard, it would be 
no good idea to confront the user with a XML editor and an 
instruction manual as tools for expressing the semantics of a 
photo. To deal with large description graphs, a visualization 
of this graph, besides a possibility to edit this graph 
interactively, is necessary. As a result NWCBIR System’s 
Annotation Tool was designed for supporting the user in the 
time consuming task of annotating photos. 

The Annotation Tool was implemented using Sun Java SDK 
1.4, while as runtime environment the versions JRE 1.4 and 
higher are supported. For XML handling, the libraries 
JDOM and JAXEN are used since they provide high level 
functions for dealing with XML based contents, which 
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speeds up the development significantly. For reading the 
EXIF information stored in the images Drew Noakes’ 
exifExtractor classes were used. 

Since NWCBIR is a Java Swing application, the designing 
started with creating a user interface that divides the 
annotation methods from the image preview and file 
browsing mechanisms. The annotation methods were 
separated from each other in extending a JPanel GUI 
element for each method or logical group of methods. As 
shown in figure 1, there are panels for creating the 
ColorLayout and ScalableColor descriptor, which are 
extracted from the image on first loading. There is the so 
called “creation panel” which shows the EXIF tags and 
values and holds the creator of the image and there are the 
“metadata description panel” for defining version and author 



of the metadata description. The “quality rating panel” is 
used for assigning a quality value and defining the person 
who rated the image quality, and the “text annotation panel” 
allows the input of a simple textual description of the image 
contents. Since a series of photos should be annotated in 
short time the file browsing tool is a specialized table, which 
allows the user to select the image in a fast and intuitive 
way. Obviously, a preview panel is also required to allow 
the user to examine the image, but also a full size preview 
has been implemented as well as the possibility to define an 
external image viewer, which can be called using a 
keyboard command to give the user the ability to use his 
favorite tools. 




Fig. 1: Simplified UML diagram of NWCBIR System’s Annotation Tool 

Central part of Annotation Tool is the so called “semantic objects like agents, places, events and times which are saved 

description panel”. It allows the user to define semantic on exit for reusing them the next time starting NWCBIR. 
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These semantic objects can also be imported from an 
existing MPEG-7 file to allow exchange of objects between 
users and editing and creating those objects in a user 
preferred tool. Semantic objects can be used for creating the 
description by dragging and dropping them onto the blue 
panel with the mouse, shown in figure 2. While testing this 
model we experienced, that this model is sufficient for users 
who often take pictures of the same persons and objects, and 




who take pictures in series, which is quite often the case 
with hobby and amateur photographers. As once the objects 
exist, they can be reused if some pictures or series have the 
same context. This is especially true for objects 
representing persons, animals and places like the relatives, 
colleagues, friends, favorite pets or places like “at home” or 
“at work”. 



Fig. 2: Creating a semantic description using NWCBIR System by drawing a graph as abstraction of the 

semantics. 



After dropping all the needed objects onto the blue panel the 
user can interconnect these objects by drawing relations 
between them using the middle mouse button. The graph, 
which is generated through these user interactions with 
NWCBIR, can be saved as part of an MPEG-7 description. 
In addition to the ability to create a new graph, NWCBIR is 
also a tool for importing, editing and deleting existing 
graphs or sub graphs. 

Further a whole series can be pre-annotated for simplifying 
and speeding up the task of annotating multiple images. All 
images within the same context are placed in one file system 
folder and the user opens the first one using NWCBIR. 



After defining a “base” description which is the same for all 
images of the series like the creator, a base textual 
description like “on our visit to Pudukkottai” and a base 
graph including the location where the photos were taken 
and time and motivation when they were taken. Finished 
with this minimal description the so called “autopilot” can 
be used, which opens all images in the defined folder 
sequentially, calculates the visual descriptors, which is a 
rather time consuming task depending on the size and 
resolution of the image, extracts the EXIF data and image 
specific parameters, creates a thumbnail instance of the 
image for later retrieval and saves the base description. 
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Fig. 3: Flow chart showing the Annotation Process in NWCBIR. 
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An obviously positive effect is that, when opening one of 
the pre-annotated photos afterwards, the thumbnail instance 
and the visual descriptors can be read from the existing 
metadata and do not have to be created, which saves time 
when opening a single image for editing. The entire 
annotation process is shown in the flow chart figure 3. 

Inside an MPEG-7 document the MediaProfile descriptor is 
used to reference instances of the media, described by the 
metadata. As well as the original image, which is 
referenced in the master profile, a thumbnail instance, 
created by NWCBIR if not already present, is referenced in 
another MediaProfile. 

IV. Results 

Annotating digital photos is a very time consuming task. A 
very common problem is the extraction of existing and 
computable metadata like EXIF information and the visual 
descriptors like ColorLayout , ScalableColor and 
EdgeHistogram. The time used for extracting a visual 
descriptor is proportional to the resolution of the image. 
This time can be easily reduced by using a faster computer 
or extracting the metadata on a server or parallel to the 
interactions of the user. 

Another problem is the interactive creation of the graph by 
the user. The creation of the main objects takes a lot of 
time. In another project, where all data comes from a 
specific context, a pre-built ontology based catalogue of 
semantic objects is used, which includes at least 95 percent 
of the needed objects. Using a catalogue like this in the 
context of a personal digital photo library does make sense, 
but it has to be updated and extended successively. If the 
user only takes photos in small numbers and on rare 
occasions, like two on a birthday party, three on this 
holidays, and one on his car newly washed, administrating 
and enhancing a catalogue of semantic objects demands 
more effort than typing in a textual description for each 
photo. Besides, in this case ability for retrieving annotated 
images will not be needed, because the number of photos 
will not exceed the critical mass for overlooking all of the 
images. 

The graphical user interface of the annotation is, since it is a 
prototype, more or less an abstraction of the MPEG-7 
descriptors, which is not intuitive for a user, who does not 



really know about MPEG-7, so there has to be done a lot of 
work “hiding” the MPEG-7 from the user. 

Existing metadata should not be lost while annotating the 
photos, but included in the MPEG-7 document. There are 
various ways in storing additional information inside an 
image, the two most common are EXIF , which is used by 
most digital camera manufacturers to save technical data 
about the photo, and a standard created by the IPTC , which 
is used for instance by the popular application Adobe 
Photoshop. The first one is very common and Java libraries 
for reading this information exist, while for the second one 
no Java implementation exists. A very interesting effect is 
that EXIF obviously allows the creator of the metadata to 
store the same information in different ways, which 
complicates a camera independent implementation. We 
experienced that Sony decided to store three tags for 
defining the time when the Photo was taken by using the 
tags “DateTime”, “DateTimeDigitized” and 
“DateTimeOriginal”, while Kodak only used the third one. 

V. Conclusion 

MPEG-7 matches many of the current requirements for a 
metadata standard for usage in a personal digital photo 
library and it defines a lot more useful descriptors, which 
could be integrated as features in such libraries. In addition 
it is not only a standard for describing the content of images, 
but it also defines ways to annotate video and audio 
documents and it is prepared for general usage with 
multimedia data. 
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Abstract - The primary goal of this paper is to analyze the 
quality of Website design of various Websites of universities in 
India and identifying the causes that affect the quality of 
Website design. The Website of each university is scanned 
using W3C guidelines which are the bases for any type of Web 
application. The parameters such as Web page size, down 
loading time, broken links, Web page errors etc., are 
considered in identifying the qualitative measures for Website 
design. 

Different kinds of tools are used to examine the components of 
Websites of various Indian Universities. These tools include: 
W3C Link checker, W3C Markup Validation Service, 
Webpage Analyzer and Website Extractor. The W3C Link 
checker accepts URL address of Web page and parses each 
and every hyperlink to find broken links in the page. The W3C 
Markup Validation Service finds the errors regarding HTML 
tags’ usage errors, properties of Web page and standards of 
the Web page mentioned by W3C Consortium. The Webpage 
Analyzer finds the number of objects used in each Web page, 
Web page size, downloading time etc., The Website Extractor 
extracts URL addresses of all Web pages of the Website. 

Keywords : Website, Page Size, Download time, Web page 
errors, W3C Link checker, W3C Markup Validation 
Service, Webpage Analyzer and Website Extractor. 

I. Introduction 

A Website is a collection of Web pages containing text, 
images, audio and video etc. Thus Web is a vast 
collection of completely uncontrolled documents. Today, 
Web is not only an information resource but also it is 
becoming an automated tool in various applications. 

Due to the increasing popularity of WWW, one can be very 
cautious in designing the Website. If the Website is not 
designed properly, the user may face many difficulties in 
using the Website. For example, if a student wants to join a 
course in a university through online mode, the Website 
must provide maximum facilities to the candidate so that he 
do not get any difficulty in admission process. To design a 
Website with high quality, one has to follow certain 
guidelines for achieving the quality Web design. Despite of 
many recommendations, ideas and guidelines, designing a 
quality Website is still a burning problem. It [1] is suggested 
that always Web design is continuous process. The authors 
Flanders, Vincent and Michel Wills [2] said 
that always design should be improved into good by looking 
from bad design. As a part of this tedious work, here I am 



trying to find out various qualitative measures from the 
existing design. This paper presents various aspects on 
analyzing the quality of Website design. The research work 
was done with a case study. 

II. Tools used in analyzing process 

A case study was conducted on Indian universities’ 
Websites related to the structure, content and other 
functional aspects. The main modules of each university’s 
Website are Departments, courses, administration, staff, 
library, admissions, examinations etc. Screen shots of some 
of the universities are shown in Figure 1 . 




Fig. 1. Snapshots of some universities: An Example 



Analysis was carried out using various tools. These include 
W3C link checker, W3C Markup Validation Service, Web 
Page Analyzer and Website Extractor. 

A. W3C Link Checker: 

The W3C Link checker [3] finds number of broken links in 
the Website. It accepts the URL address of Web page and 
parses each and every hyperlink in the page. It finds the 
status code of each link and by using the status code it 
identifies the broken links related to the page. A Screen shot 
of W3C Link checker was shown in the following Figure 2. 
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W3C Markup Validation Service: 

The W3C Markup Validation Service [4] finds the errors 
related to the HTML pages. It validates the Web page 
regarding errors in HTML tags, properties of Web page and 
standards of the Web page mentioned by W3C organization. 
A screen shot of W3C Markup Validation Service is shown 
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Fig. 4. Website Optimization using Web Page Analyzer 



D. Website Extractor: 

The Website Extractor [4] extracts the all the components of 
Website. It accepts Website address and produces URL 
addresses of all Web pages. The snapshot of Website 
extractor is shown in figure 5. 



in Figure 3. 





Fig. 5. Website Extractor to extract all Web pages of 
university Website 



Fig. 3. HTML Validator to find markup errors of various Web 
pages of university Website 

C. Web Page Analyzer: 

The Web Page Analyzer [5] finds the number of objects 
used in each Web page, Web page size and downloading 
time of all objects. It accepts URL address of a Web page 
and generates a report containing details like number of 
image files, number of HTML files, number of script files, 
down load time etc., of the Web page. A screen shot of 
Webpage Analyzer is shown in Figure 4. 



III. Guidelines framed from W3C: 

World Wide Web Consortium (W3C) [6], [7] defines a set 
of guidelines for quality Web design. All guidelines are 
summarized into 12 guidelines for simplicity. Every 
guideline provides a technique for accessing the content of 
Website. The guidelines are as follows. 

Guideline 1 : Provide a text equivalent for every non-text 
element. This includes images, graphical representations of 
text, image map regions, animations, applets and 
programmatic objects, frames, scripts, spaces, audio and 
video files. 

Guideline 2: Do not rely on color scheme only. The content 
of Web page must match with foreground and background 
color. Also provide sufficient contrast to the content for 
visibility. 

Guideline 3: Use markup and style sheets instead of images 
to convey information. Style sheets control the layout and 
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presentation of the Web page and decreases the download 
time of the Web page. 

Guideline 4\ Clearly mention the text information of Web 
page with natural language. Specify the expansion of each 
abbreviation or acronym in the document. 

Guideline 5: Use tables properly in the Web document. For 
data tables, clearly specify row and column headers and 
number of rows and columns exactly. 

Guideline 6: Ensure that Web pages featuring new 
technologies transform gracefully. When dynamic contents 
are updated, ensure that content is changed. Ensure that 
pages are available and meaningful when scripts, applets or 
other programmatic objects are not supported by the 
browsers. If this is not possible, provide equivalent 
information as alternative in the Web page. 

Guideline 7: Ensure user control of time sensitive content 
changes. Until user agents provide the ability to stop the 
refresh, do not create periodically auto -refreshing pages. 
Guideline 8: Ensure direct accessibility of embedded user 
interfaces. Make programmatic elements such as scripts and 
applets directly accessible or compatible with assistive 
technologies. 

Guideline 9: Design for device-independence. Ensure that 
any element that has its own interface can be operated in a 
device-independent manner. 

Guideline 10: Provide context orientation information. Title 
each frame to facilitate frame identification and navigation. 
Divide large blocks of information into more manageable 
groups wherever appropriate. 

Guideline 11: Provide clear navigation mechanisms. Clearly 
identify the target of each link. Provide information about 
the general layout of a site such as site map or table of 
contents. 

Guideline 12: Ensure that documents are clear and simple. 
Create a style of presentation that is consistent across pages. 

IV. Methodology 

The study was conducted on nearly 50 Indian Universities’ 
Websites and considering approximately 5000 Web pages. 
A Web program was developed to study each university’s 
Website. The Web program consists of four modules: 
Website Extractor, Link Checker, HTML Validator and 
Web Page Analyzer. The URL address of each Website is 
thoroughly scanned using Website Extractor to get all Web 
pages of Website and Web pages of each university are 
stored in separate files. A Website is verified with Link 
Checker module to get number of broken links in the 
Website. The components that include: text, images, forms, 
graphics, audio and video files etc., and download time of 
Web page are gathered using Web page analyzer and stored 
in separate file. The errors of Web page related HTML tags 
are traced using W3C HTML Validator and they are stored 
in files. The overall structure of Web program is shown in 
Ligure 6. 
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Lig. 6. Architecture of Web program 



The components of some universities’ Websites such as 
number of Web pages, size, number of errors, download 
time, broken links etc., are summarized in the table 1. 

V. Website Errors 

The Web page errors that are generated using Web 
program are considered to identify the measures for quality 
Website design. These errors are further divided into major 
and minor errors using statistical techniques [8]. 

A. Major errors: 

The major errors directly affect the quality of Web 
site design and developers must concentrate on this 
category of errors and these should be eliminated. The 
major errors include: broken links, document type 

declaration errors, applet usage errors, server connectivity 
errors, image load errors, frames tag usage errors and title 
tag with no keyword errors. The major errors are 
proportional to the down load time of the Web pages. If 
major errors are minimized then down load time will be 
automatically reduced and hence it leads to the better 
quality. The major errors of some universities’ Websites 
are shown in table 2. The figure 7 shows the graph that 
depicts different major errors and their effect on Website 
design. 
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Major Errors Report 




Fig. 7. Graph showing major errors of Websites of various 
universities 

B. Minor errors 

The minor errors are HTML tag errors and these may cause 
incorrect display of some components of Web pages. The 
minor errors include: table tag errors, body tag errors, 
image tag errors, head tag errors, font tag errors, script tag 
errors, style tag errors, form tag errors, link tag errors 
and other tag errors. The developers must be attentive so 
that Web pages can be properly designed with appropriate 
HTML tags. The minor errors of some universities’ 
Websites are given in table 3. The graph in figure 8 
shows various minor errors of various universities’ 
Websites. 



Sno 


Measures to be 


Errors considered 




evaluated 


Minor errors 


Major errors 


1 


Text formatting 
measures 


BTE, FTE, HTE 




2 


Link formatting 
measures 


LTE 


Broken links 


3 


Page formatting 
measures 


TTE, FTE, StTE, 
FoTE 


Frame tag usage 
errors, document type 
usage errors 


4 


Graphics element 
measures 


ITE, BTE 


Image load errors 


5 


Page performance 
measures 


FmTE, STE, 


Title tag with no 
keyword errors 


6 


Site architecture 
measures 


STE 


Applet usage errors, 
server connectivity 
errors, down load 
time of Website, 
broken links 



Minor Errors Report 




Fig. 8. Graph showing minor errors of Websites of various 
universities 
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Sno 


University Name 


Website address 


No. of Web 
pages 


Total Web 

pages size 


Total no. of Web 
page errors in 
Website 


Average no. of 
errors in each 
page 


Download 
time at 28 
Kbps 
(in secs) 


No. of 

broken 

links 


1 


Anna University 


www . annauniv.edu 


117 


6187357 


2145 


18.33333 


1713 


21 


2 


Bangalore University 


www.bub.ernet.in 


659 


18297162 


19555 


29.67375 


5065 


117 


3 


Bharatiar University 


www.b-u.ac.in 


182 


7571965 


5568 


30.59341 


2096 


29 


4 


Indira Gandhi National Open 
University 


www.ignou.ac.in 


252 


9881675 


4573 


18.14683 


2736 


56 


5 


JNT University 


www.intu.ac.in 


91 


5663958 


3885 


42.69231 


1568 


28 


6 


Jawaharlal Nehru University 


www.inu.ac.in 


224 


8749265 


8894 


39.70536 


2422 


67 



Table 1 



Sno 


University 


Broken 

Links 


Document 

type declaration errors 


Applet 
usage errors 


Server 

connectivity 

errors 


Image 

load 

errors 


Frames 
tag usage 
errors 


Title tag 

with no keywords 
errors 


Total Major 
Errors 


1 


Anna University 


31 


115 


112 


51 


563 


150 


115 


1137 


2 


Bangalore University 


21 


117 


20 


16 


15 


43 


3 


235 


3 


Bharathiar University 


29 


182 


46 


23 


104 


87 


182 


653 


4 


Indira Gandhi National 
Open University 


29 


182 


47 


37 


97 


79 


182 


653 


5 


Jawaharlal Nehru 
Technological University 


31 


101 


62 


48 


228 


178 


101 


749 


6 


Jawaharlal Nehru 
University 


56 


252 


12 


13 


27 


36 


122 


518 



Table 2 



Sno 


University 


TTE 


BTE 


ITE 


FTE 


THE 


FoTE 


STE 


StTE 


FmTE 


LTE 


OTE 


TotME 


1 


Anna University 


1245 


75 


97 


0 


0 


154 


143 


98 


0 


0 


98 


1910 


2 


Bangalore 

University 


6543 


759 


941 


798 


521 


957 


1168 


97 


534 


0 


1067 


13385 


3 


Bharatiar University 


2567 


168 


268 


241 


87 


369 


675 


97 


73 


43 


327 


4915 


4 


IGNOU 


1165 


126 


1467 


114 


219 


347 


148 


26 


46 


32 


365 


4055 


5 


JNTU 


976 


214 


765 


320 


118 


421 


241 


23 


26 


32 


112 


3248 


6 


Jawaharlal Nehru 
University 


2987 


1236 


869 


225 


224 


859 


621 


46 


356 


45 


248 


7716 



Table 3 



TTE: Table Tag Errors BTE: Body Tag Errors ITE:lmage Tag Errors FTE:Frame Tag Errors 

HTE: Head Tag Errors FoTE: Font Tag Errors STE: Script Tag Errors StTE: Style Tag Errors 



FmTE: Form Tag Errors 



LTE: Link Tag Errors 



OTE: Other Tags Errors 



TotME: Total Minor Errors 
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VII. Conclusion 



VI. Identifying Qualitative Measures for 
Website Design 

The errors that are found in Websites’ of various 
universities lead to the necessity of qualitative measures for 
effective Website design. The head tag errors (HTE), font 
tag errors (FoTE) and body tag errors (BTE) identify the 
problems in the text elements of we page. Thus Text 
formatting measures are to be evaluated. The image tag 
error 

(ITE), body tag errors (BTE) and image load errors related 
to image identifies the errors in display of images and hence 
Graphic element measures to be evaluated. The table tag 
errors (TTE), frame tag errors (FTE), style tag errors (StTE), 
font tag errors (FoTE), frame tag usage errors and document 
type declaration errors cause the invention of page 
formatting measures. Link Tag Errors (LTE) and broken 
links identify the need of link formatting measures. The 
form tag errors (FmTE), script tag errors (STE) and title tag 
with no keyword errors identify the need of page 
performance measure. The script tag errors (STE), applet 
usage errors, server connectivity errors, down load time of 
Website and broken link errors contribute the need of 
Website architecture measure. All these measures are shown 
in table 4. 



Table 4 



Sno 


Measures to be 
evaluated 


Errors considered 


Minor errors 


Major errors 


1 


Text formatting 
measures 


BTE, FTE, HTE 




2 


Link formatting 
measures 


LTE 


Broken links 


3 


Page formatting 
measures 


TTE, FTE, 
StTE, FoTE 


Frame tag usage 
errors, document 
type usage errors 


4 


Graphics element 
measures 


ITE, BTE 


Image load errors 


5 


Page 

performance 

measures 


FmTE, STE, 


Title tag with no 
keyword errors 


6 


Site architecture 
measures 


STE 


Applet usage 
errors, server 
connectivity 
errors, down load 
time of Website, 
broken links 



This paper aims to investigate into various measures 
required for quality Website design. A focused approach has 
been made to identify all possible errors in developing 
Website so that before going to use any qualitative measure, 
it is necessary to verify whether all aspects of Website 
design are considered in quality assessment or not. This 
would enable to adjudge the quality status of Web 
design of the various universities and would indicate 
the necessity of improvement in the design of the 
Website. We can extend this work to develop a set of 
metrics specifically in higher educational institutions’ 
Websites using the results of the study. 
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Process modeling using ILOG JViews BPMN 
Modeler tool to Identify Exceptions 

First A. Saravanan. M.S, Second B. Rama Sree. R .J 



Abstract - Today all the Business analysts uses Business 
Process Modeling Notation (BPMN) to model business process 
diagrams. Business process modeling is the activity of 
representing processes of an enterprise, which allows the 
business analyst to focus on the proper sequence flow, of the 
business processes, without concerning himself / herself on the 
proper implementation of the process; e.g., be more concerned 
that a ‘Sales or Purchase’ process includes delivering or 
receiving the items and not how the items will be delivered or 
received. These strengths of BPMN allow businesses to increase 
efficiency by automating part of their business processes using 
Business Process Modeling Notation and by giving a clear 
representation and analysis of their business process, using 
business process diagrams (BPDs) to identify the unhandled 
exceptions. The Business process modeling tools provide 
business users with the ability to model their business 
processes, implement and execute those models. This paper 
presents a simple, yet instructive example of how an ILOG 
JViews BPMN Modeler tool can be used to identify and verify 
exceptions for a “Deliver Items” business process. 

Keywords - Exception, Modeling, Notation, BPMN, BPD 
I. INTRODUCTION 

B PMN stands for business process modeling notation. 
It is a new standard for modeling business processes. 
BPMN has a diagram called the Business Process Diagram 
(BPD). Business process modeling is the activity of 
representing processes of an enterprise, which allows the 
business analyst to focus on the proper sequence flow, of the 
business processes [1]. A goal for the development of 
BPMN is that the notation be simple and adoptable by 
business analysts. Also, there is a potentially conflicting 
requirement that BPMN provide the power to depict 
complex business processes. [2] 

The BPMN business process diagram has been designed 
to be easy to use and understand but also provides the ability 
to model complex business processes [3]. The execution of 
a business process often includes multiple entries. These 
entries are not under the control of the process. Because of 
the process details or complexity of the real world, their 
behavior 
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details cannot always be predicted. At final, a business 
process may have a single ideal execution path. 

In practice many executions of process will encounter 
events, i.e., errors or missing deadlines that lead the process, 
off this path. The exception handling is not a favorite issue 
for programmers or analysts. They often focus on the likely 
or ideal business scenarios and end up ignoring the handling 
of diverse error conditions. Therefore, we used the BPMN 
tool to model the business process with unhandled 
exceptions. 

The main goal of BPMN is to provide a notation that is 
readily understandable by all business users. This includes 
the business analysts that create the initial drafts to identify 
and verify the sequence of operations with minimal error 
possibilities likely called exceptions, occurs during the 
product implementation. 

II. RELATED WORK 

Exception handling is a one of the programming 
constructs [4], which occurs during the execution of a 
program that interrupts the normal flow of the program’s 
instructions [5]. To achieve better quality product the 
Exception handling is acting as an interface between 
programmer and languages [6] . 

The Exceptions can be identified during the software 
modeling phase to avoid product failure during product 
implementation. The software development industry uses 
their own modeling technique without any standard, then 
after the introduction of BPMN; the entire software industry 
system analyst are started using “An Industry Standard for 
Process Modeling” [7]. 

The first commercial edition of BPMN 1.0 released in 
May’2004 [3], the BPMN 1.0 specification was released to 
the public, February’2006 and BPMN1.0 was adopted as an 
OMG standard [8]. Currently there are thirty -nine 
companies that have implementations of BPMN. 

Prior to BPMN there were many of the process modeling 
tools and methodologies; i.e., all sorts of visual business 
process flow-chart formats were used. After the business 
analysts focus on BPMN’s Business Process Diagram 
(BPD) [3], most of the software development system 
analysts were benefited more to produce quality product 
with good design and modeling. The problems with 
different representations of older system has created some 
problems, they are 

• Business analysts are required to understand multiple 
representations of business processes. 

• Business participants that don’t share the same 

graphical notations might not understand one 

another. 




Global Journal of Computer Science and Technology 



Page | 19 



• A technical gap between the format of the business 
process initial design and the format of the languages 
that will execute these business processes. 

(1) BPMN provides BPD - to be used by people who 
model and manage business processes [9]. 

(2) BPMN provides formal mapping to an 
Execution language of the BPM system 

to be used by System Analysts who design 
the Software process execution [9]. 

III. THE ROLE OF EXCEPTION HANDLING IN 
BUSINESS PROCESS MODELING 

The Exception handling could be the critical focus of 
process modeling and analysis, in most of the cases you 
could be the wrong. So, Exceptions are playing major role 
during the time of process modeling phase and why model 
business process and when should we use the Exceptions? 
And can business analyst model the Exception handling? 
We will discuss these questions in the following sections. 

A. Why Model Business Processes? 

Companies are finding many reasons to capture their 
business processes. Companies who have merged want to 
examine processes across their lines of business to discover 
which one is the best of breed. Other companies are looking 
to improve their existing processes, or even to automate 
them. In some countries, government regulations require 
that business processes be properly documented. For 
example, some companies in the United States regulates that 
certain processes must be well documented. These are 
among the many factors in the business world today, that are 
making companies take a closer look at their business 
processes. [10] 

B. When should we Use Exceptions? 

The simple answer is: “whenever the semantic and 
performance characteristics of exceptions are appropriate”. 
An oft-cited guideline is to ask our self the question “Is this 
an exceptional or unexpected situation?” This guideline has 
an attractive ring to it, but is usually a mistake. The problem 
is that one person’s “exceptional” is another’s ’’expected”: 
when you really look at the terms carefully, the distinction 
evaporates and you are left with no guideline. After all, if 
you check for an error condition, then in some sense you 
expect it to happen, or the check is wasted code. 

A more appropriate question to ask is: “do we want stack 
unwinding here?” Because actually handling an exception is 
likely to be significantly slower than executing mainline 
code, you should also ask: “Can I afford stack unwinding 
here?” For example, a desktop application performing a long 
computation might periodically check to see whether the 
user had pressed a cancel button. Throwing an exception 
could allow the operation to be cancelled gracefully. On the 
other hand, it would probably be inappropriate to throw and 
handle exceptions in the inner loop of this computation 
because that could have a significant performance impact. 
The guideline mentioned above has a grain of truth in it: in 



time critical code, throwing an exception should be the 
exception, not the rule. [11] 

C. Can Business Analyst Model Exception 

Handling ? 

In conventional wisdom in business that 80 percentage of 
the problems are caused by 20 percentage of the work. 
Certainly the designers of the Business Process Modeling 
Notation standard had Exception handling in mind from the 
start. BPMN introduces to process modeling the notion of 
intermediate events. It's a god awful name but an absolutely 
essential concept for making exception handling visible in 
the process diagram and understandable to business people. 
In fact, events are the key difference between BPMN and 
traditional flowcharting. 

In BPMN, intermediate events are drawn as circles with a 
double border, with a symbol inside denoting the type of 
event: receipt of an external signal message, a timeout, a 
system fault, etc. When drawn attached to the border of a 
process activity or sub process, the semantics of BPMN say 
that the activity or sub process is interrupted immediately, 
and the process continues along the flow path leading out 
from the intermediate event. That path is called "exception 
flow." If the event never occurs, the activity or sub process 
completes normally and processing continues along the flow 
path leading out from it. That path is called "normal flow." 
Is that hard to understand? No, I didn't think so. You'd be 
surprised then to know that a number of modeling tools 
offered by Business Process Management System (BPMS) 
vendors that advertise themselves as BPMN compliant don't 
support intermediate events. What those vendors usually say 
is that the concept is too technical for business analysts to 
understand. What they really mean, in most cases that their 
process engine executes the model to automate and monitor 
the process flow can’t handle them. The modeling tool’s 
simulation engine has no idea what to do with them, either. 
So they’re just left out of the tool. Shame on Object 
Management Group (OMG) for allowing this pseudo- 
BPMN to reproduce as it has. 

The traditional alternative to modeling exception handling 
explicitly in the process diagram is to do it in code, toss it 
over the wall to IT. Once in a while this might be necessary, 
but as a general principle it's just plain wrong. By removing 
exception handling from the process model and burying it in 
implementation code, you’ve not only lost visibility into 
what’s going on, you’ve lost the ability to use it in 
simulation analysis, you’ve lost agility, reuse, shared best 
practices, etc. All of those assume exception handling is in 
the process model. [12]. 

IV. BPMN TOOLS HISTORY 

The current working routines of the business analyst 
models a diagram on a piece of paper, but you won’t get the 
assistance that only quality software has to offer. We 
reviewed several business process modeling applications, 
ILOG JViews BPMN Modeler tool by ilog.com [13], 
BizAgi process modeler by bizagi.com [14], eBPMN 
Designer by soyatec.com, Business Process Visual 
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ARCHITECT by visual-paradigm.com, Business Process 
Modeler for Business Analysts by eClarus.com, Tibco 
Business studio by Tibco.com, and Intalio Designer by 
intalio.com. 

These modeling tools follow the BPMN specification. 
Some may add extensions that fit their users’ needs. They 
employ static verification to force the user to keep within 
the BPMN constraints; much like a word processor employs 
spell checking to warn against mistakes. Since BPMN 
specifications provide that exceptions must be handled and 
force its user to handle exceptions particularly ILOG J Views 
BPMN Modeler tool [13]. 

A. A First Look at ILOG JViews BPMN Modeler 
Tool 

The ILOG JViews BPMN Modeler introduced different 
versions of modeling tools. 

The Latest version of ILOG JViews BPMN Modeler 1.1.2 
tool was introduced in the beginning of year 2009. It is very 
easy to use; in a matter of minutes you will be able to begin 
defining your processes and collaborate with other people in 
your organization. 

Generally, BPMN specified a single business process 
diagram, called the Business Process Diagram (BPD) [3]. 
This diagram was designed to do two things well. 

Lirst, it is easy to use and understand. You can use it too 
quickly and easily model business processes, and it is easily 
understandable by non-technical users, usually management 
[15]. 

Second, it offers the expressiveness to model very 
complex business processes, and can be naturally mapped to 
business execution languages [15]. To model a business 
process flow, you simply model the events that occur to start 
a process, the processes that get performed, and the end 
result of the process flow. Business decisions and branching 
of flows is modeled using gateways. A gateway is similar to 
a decision symbol in a flowchart. 

Lurthermore, a process in the flow can contain sub- 
processes, which can be graphically shown by another 
business process diagram connected via a hyperlink to a 
process symbol. If sub-processes do not decompose 
processes, it is considered a task the lowest level process. A 
4 +’ mark in the process symbol denotes that the process is 
decomposed; if it doesn’t have a 4 +’ mark, it is a task. 

As you drive further into business analysis, you can 
specify ‘who does what’ by placing the events and processes 
into shaded areas called pools that denote who is performing 
a process. You can further partition a pool into lanes. A pool 
typically represents an organization and a lane typically 
represents a department within that organization, although 
you may make them represent other things such as 
functions, applications, and systems. 

B. ILOG JViews BPMN Modeler tool Events and 
Notations 

The ILOG JViews BPMN Modeler tool follows the 
general modeling notations supported by business process 
modeling. During business process modeling, you model the 
events that happen in the business, and show how they affect 



process flows. An event either kicks off a process flow [16], 
or happens during a process flow, or ends a process flow. 
BPMN provides a distinct notation for each of these types of 
events, shown in the Table I, below. 



TABLE I. 

BASIC EVENT TYPES IN BPMN AND THEIR NOTATIONS. 



Start Event 


Intermediate Event 


End Event 


Starts a 

process 

flow. 


Start Event 

C> 


Happens 
during the 
course of a 
process 
flow. 


Event 

0 


Ends a 

process 

flow. 


End Event 

0 



When you model more complex process flows, such as B2B 
web services, you need to model more complex business 
events, such as messages [16], timers [16], business rules 
[16], and error conditions. BPMN enables you to specify the 
trigger type of the event, and denote it with a representative 
icon, as specified in Table II. Specifying a trigger type to an 
event puts certain constraints on the process flow that you 
are modeling, which are explained in the table. Lor example, 
a timer cannot end a process flow. You can only draw 
message flows from and to message events. These types of 
modeling rules, which are actually kinds of business rules, 
should be enforced automatically by the modeling tool 
providing support for BPMN. Oftentimes an event happens 
while a particular process is being performed, causing an 
interrupt to the process, and triggering a new process to be 
performed. The process will complete, causing an event to 
start, and a new process to be performed. You can model 
these intermediate events by placing an event symbol 
directly on the process. The different events were available 
in the event toolbar and gives access to several types of 
event that can occur within a BPMN process, Lor example, 
message, timer, exception, cancel, compensation, rule, link, 
multiple, signal and terminate etc., are available in the ILOG 
JViews BPMN Modeler events toolbar. 
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TABLE II. 

ADVANCED EVENT TRIGGER TYPES IN BPMN AND THEIR 
NOTATIONS. 



Start 

Events 


Interme 

diate 

Events 


End Events 


Description 


Start Message 

§- 


Message 


End Message 


A start message arrives from 
a participant and triggers the 
start of the process, or 
continues the process in the 
case of an intermediate event. 
An end message denotes a 
message generated at the end 
of a process. 


StartTimer 

t 


Timer 

§ 


A Timer 
cannot be an 
End Event. 


A specific time or cycle, for 
example every Monday at 
9am can be set to trigger the 
start of the process, or 
continue the process in the 
case of an intermediate event. 


Start Rule 
§ 


Rule 


A Rule cannot 
be an End 
Event. 


Triggers when the conditions 
for a rule become true, such 
as “Stock price changes by 
more than 10% since 
opening.” 


Start Link 
§ 


Link 

0 


End Link 


A link is a mechanism for 
connecting the end event of 
one process flow to the start 
event of another process 
flow. 


Start Multiple 
1 


Multiple 


End Multiple 

o 


For a start multiple event, 
there are multiple ways of 
triggering the process, or 
continuing the process in the 
case of the intermediate 
event. Only one of them is 
required. The attributes of the 
event define which of the 
other types of triggers apply. 
For end multiple, there are 
multiple consequences of 
ending the process, all of 
which will occur, for 
example, multiple messages 
sent. 


An 

Excepti 

on 

cannot 
be a 
Start 
event 


Exception 

§ 


End Exception 

♦© 


An end exception event 
informs the process engine 
that a named error should be 
generated. This error will be 
caught by an intermediate 
exception event. 


A 

Compen 

sation 

event 

cannot 

be a 

Start 

event 


Compensation 

§ 


End Compensation 

■*0 


An end compensation event 
informs the process engine 
that compensation is 

necessary. This compensation 
identifier is used by an 
intermediate event when the 
process is rolling back. 


An End 

event 

cannot 

be a 

Start 

event 


An End 

event 

cannot 

be an 

Interme 

diate 

event 


End Cancel 


An end event means that the 
user has decided to cancel the 
process. The process is ended 
with normal event handling. 


An End 
Kill 
event 
cannot 


An End 
Kill 
event 
cannot 


End Kill 

♦e 


An end kill event means that 
there is a fatal error and that 
all activities in the process 
should be immediately 



be a 


be a 




ended. The process is ended 


Start 


Interme 




without compensation or 


event 


diate 

event 




event handling. 



C. ILOG JViews BPMN Modeler tool activity 
classification legend 

To implement ILOG JViews BPMN Modeler tool to any 
type of business process, the following activity classification 
legends are used. 

1. Query data (Example: Find Orders) 

2. Enter data (Example: Log received items) 

3. Update data (Example: Log received items) 

4. Produce data (Example: Perform Regression 
Test) 

5. Send notification (Example: Notify customer 
RMA number is invalid) 

6. Receive notification (Example: Receive Report 
State of Accounts) 

7. Send and Receive data (Example: Notify 
customer) 

8. Analyze data (Example: Allocate Defects) 

9. Perform action (Example: Negotiate return) 

s - Sub-Process 

g - Gateway 

x - Complex activity 

D. ILOG JViews BPMN Modeler tool Exceptions 
legend 

To implement ILOG JViews BPMN Modeler tool to any 
business process; the following exception legends are used. 
QF - Query failed 

UF - Update failed 

UR - Update rejected 

SF - Send failed 

SR - Send rejected 

RR - Receive rejected (data rejected) 

RR - Receive rejected (authorization) 

RR - Receive rejected (authentication) 

RR - Receive rejected (security) 

NR - Notification rejected 
VR - Analysis/Verification rejected 
AF - Action failed 

TO - Timeout 

Org - The original business process diagram, 
modeled using a regular modeling tool 
Enh - A user of ILOG JViews BPMN Modeler’s 
modeling tool with suggestions 
Exp - An Industry expert 
#Sug - Number of suggestions for exceptions, 
given by the enhanced tool, according to 
classification 

#Sel - Number of exception the user of the 

enhanced tool selected from the suggestions. 
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V. EXCEPTION HANDLING EXPERIMENT 
RESULTS USING ILOG JVIEWS BPMN MODELER 
TOOL 

To prove the importance of ILOG J Views BPMN 
Modeler tool to model the business process, let us take one 
of the business process “Deliver Items” from a business 
company. This ILOG JViews BPMN Modeler tool is also 
used particularly to identify and verify the exceptions of any 
business process without any difficulties or overhead. 

The results show the ILOG JViews BPMN Modeler tool 
usage and importance of modeling any business process to 
produce a quality model to do further phases of product 
development. The ILOG JViews BPMN Modeler tool will 
organize the processes with the help of business process 
diagrams. 

A. Activity classification with possible exceptions 

The “Deliver Items” process has the following table of 
Activity classification with probable exceptions. 



TABLE III. 

ACTIVITY CLASSIFICATION WITH POSSIBLE EXCEPTIONS. 



Business Activity 


Classification 


Probable 

Exceptions 


Process: Deliver Items 


Delivery with invoice 


9 


AF 


Update invoice 


2 


UR 


Terminate delivery 


9 




(pay on delivery?) 


g 




Delivery 


9 




Receive payment 


9;2 




Follow up 


9 





B. Business process diagram of “Deliver Items ” 
process 

The Business Process Diagram developed for “Deliver 
Items” process using the ILOG JViews BPMN Modeler tool 
with different notations. 




Fig.l. Experiment input: “Deliver Items” 

C. Result of “Deliver Items ” process with 
exception 

The following table shows the deliver items process with 
identified and verified exceptions in various stages of the 
process. It is the out come of deliver items process result, 
conducted process modeling by the ILOG JViews BPMN 
Modeler tool and it agree with the possibility or occurrence 
of exception or failure of process with an expert or system 
analyst opinion of different software development 
companies all around the world. 



TABLE IV. 

RESULT OF “DELIVER ITEMS” PROCESS WITH EXCEPTION 



Business 

Activity 


Clss 


#sug 


#sel 


Org 


Enh 


Exp 


Delivery with 
invoice 


9 


1 


1 


AF 


AF 


AF 


Update invoice 


2 


1 


1 




UR 




Terminate 

delivery 


9 


1 


0 








(pay on 
delivery?) 


g 












Delivery 


9 


1 


0 








Receive payment 


9;2 


2 


0 






TO 


Follow up 


9 


2 


0 









In the above “Deliver Items” process the possible 
or identified exceptions (Enh) by the ILOG JViews BPMN 
Modeler tool are mostly correlated with the expert (Exp) 
opinion and the regular software development companies 
modeling process (Org), for example the “Delivery with 
invoice” process produced or advised the “Action failed” 
exception for all these three cases. So, there is no denial that 
using the ILOG JViews BPMN Modeler tool produces more 
accurate business processes, according to an expert’s 
opinion. That is, most of the exceptions that the expert 
considered probable are handled in the resulting business 
process. 
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VI. CONCLUSION 

Based on the observation that in a given “Deliver Items” 
business process produced accurate result, in this paper, we 
first looked about process modeling then importance of 
exception with the role of BPMN tool in the business 
analyst. Our experiment with an ILOG JViews BPMN 
Modeler tool is very useful to identify and verify the 
exceptions at the time of modeling the product. So we can 
produce good product with performance and quality. It adds 
an additional benefit that to reduce the program failure and 
ease to construct code and test the product at the time of 
Quality of service. So, this ILOG JViews BPMN Modeler 
tool assists in producing robust business processes during 
the product modeling. That is, most of the exceptions that 
the expert considered probable are handled in the resulting 
business process, unlike the case of the regular tool. 
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A new approach to: Obstacle- Avoiding Rectilinear 

Steiner Tree Construction 

Animesh Pant a 
NIT Raipur 



Abstract :- Given a set of pins and a set of obstacles on a plane, 
an obstacle-avoiding rectilinear Steiner tree(OARST) connects 
these points, possibly through some additional points(called 
Steiner points), and avoids running through any obstacle to 
construct a tree with a minimal total wire length. The OARST 
problem has received dramatically increasing attention 
recently. Nevertheless, considering obstacles significantly 
increases the problem complexity. Based on Obstacle-avoiding 
Spanning Graph (OASG), and edge based heuristic method has 
been applied to find the rectilinear Steiner tree with minimum 
wire length. 

I. Introduction to the Problem: 

T he problem of creation of Obstacle -Avoiding 
Rectilinear Steiner Trees can be stated by the following 
steps: 

Stepl: User defined inputs are taken from input files which 
contains set of nodes and set of obstacles. 

Step2: Using a spanning graph generation algorithm an 
Obstacle Avoiding Rectilinear Spanning Graph (OARSG) 
is generated. 

Step3: Prim’s algorithm is applied to find the Minimum 
Spanning Tree (MST) from the OARSG. 

Step4: Finally an Edge based heuristic is used on the 
minimum spanning tree to create the Obstacle Avoiding 
Rectilinear Steiner Tree. 

II. Assumptions: 

1 . Obstacles are of rectangular shape only. 

2. No node can be inside the obstacle. Nodes can be either 
outside the obstacle boundaries or on the boundary of the 
obstacle. 

3. Obstacles should not overlap each other. 

III. Algorithms 

1 ) Constructing the Hanan Grid 
A non-uniform two-dimensional routing grid is constructed 
by drawing horizontal and vertical lines on every node and 
the boundaries of the obstacles as shown in Fig. lb. These 
lines are extended from terminals and corners of all 
obstacles in both horizontal and vertical directions until 
blocked by any obstacle or boundary of the design. 




Fig. 1(a) Fig. 1(b) 

Fig. 1(a) A design instance with terminals and obstacles. 

Fig. 1(b) A non-uniform routing grid (Hanan Grid) 

2 ) Constructing the Obstacle-Avoiding Rectilinear 

Spanning Graph 

A spanning graph is constructed considering the obstacles. 
We connect each pair of nodes in the input set through the 
Manhattan path taking one node as the source and the other 
as the destination and check if the Manhattan path is 
obstructed by any obstacle. If the path is obstructed by 
any obstacle then we find out the path to the nearest 
node of the opposite edge of the obstacle (opposite to 
the edge making the obstruction) and take that path 
and it is assigned as the source node. Now we will find 
the Manhattan path again for the changed source to the 
destination point. If this path is also obstructed by any 
obstacle then the same method is applied. 






Fig (a) Manhattan paths being obstructed, Fig (b) The path 
being followed in case of an obstruction. 

IV. Pseudo code for generation of Obstacle- 
Avoiding Rectilinear Spanning Graph 

INPUT: Co-ordinate points of set of nodes and obstacles 

OUTPUT: Set of edges forming the Obstacle- Avoiding 
Rectilinear 

Spanning Graph 
For every pair of nodes: 

Source=nodel, destination=node2; 

Spanning_graph(source, destination) 

{ Find the manhattan paths between the source node and the 
destination node 
For(each manhattan path) 

{ 

if(edge is obstructed by an obstacle) 



a For promoting international students research forum only 




Global Journal of Computer Science and Technology 



Page | 25 



{ 

Traverse till the nearest node of the opposite edge 
of the obstacle that is creating an obstruction; 

Set source node = nearest node of opposite edge 
of obstacle creating an obstruction; 

Spanning_graph(source, destination); //recursive call 
to the function 
} 

else Traverse through this manhattan path; 

} 

Return(minimum of the two manhattan paths); 

} 

V. Algorithm for constructing minimum 

SPANNING TREE 

After generating the Obstacle Avoiding Spanning Graph 
(OASG) from the above step we apply the Prim’s Algorithm 
on this OASG to get the minimum spanning tree. 

In general, minimum spanning tree (MST) uses the 
Manhattan distance between two terminals as cost. 
However, it will not be accurate in estimating the real 
routing distance with the existence of the obstacles. In order 
to have the real routing distance the distance of the whole 
path between the source and the destination is found. 



VI. Algorithm for generating Obstacle- 
Avoiding Rectilinear Steiner Tree . 

INPUT: The edges of the MST defined by the end points 
and the bend. 

OUTPUT: The Steiner nodes and the edges of the Steiner 
Tree defined by the end points and the bend. 

Steps: 

1. Let the rectilinear minimum spanning tree of the set of 
nodes be denoted as T. 

2. for(nodes in T) 

{ 

While(it is possible to drop normal on edges of T) 

{ 

If(normal on edge of T is obstructed) 

Then continue the loop; 

Find the longest edge in the cycle formed due to the 
normal; 

If ( (cost of normal) < (cost of longest edge)) 

Then replace the pair of edges with 3 new edges 
and a new node; 

} 

} 

3. The new nodes thus formed are the Steiner Nodes. 

4. The new tree thus formed is the Steiner Tree as of 
Following Figure. 
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Fig. 2A 
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Fig, 2B 



Consider the rectilinear MST shown in Fig. 2 A. To draw 
rectilinear Steiner Tree we used the edge-based heuristic. 
We drop a perpendicular from node A to the rectilinear 
component of the edge E at the point S. If this perpendicular 
is obstructing with an obstacle then we will stop the process 
and continue with other possible perpendicular. If this 
perpendicular is obstacle free then a cycle is created, as 
shown in Fig. 2B and the longest edge in the cycle is edge F. 
Since the length of edge F is greater than the edge AS, we 
will consider S as a Steiner Node. The edges E and F are 



deleted and 3 new edges AS, CS and BS are drawn. The 
changed tree is shown in Fig. 2C. 
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Fig. 2C 



Next we will search for an edge on which it is possible to 
draw a perpendicular from node A. If we get such an edge 
then we will repeat the above process again and we will get 
one more Steiner node. Similarly we will repeat the process 
for all the nodes to get the rectilinear Steiner tree with 
minimum cost. 

VII. Advantages of Project 

1. This project can be used to make an Obstacle Avoiding 
rectilinear Steiner tree, so that it can reduce the wire length 
of the connection of the nodes. 

2. In extreme scenario also the algorithm will give the near 
optimal solution for the Steiner tree construction problem. 

3. It has already been proved that there exists at least one 
minimum Steiner tree in the non-uniform routing grid if the 
Steiner tree problem is solvable. 

VIII. Criticism 

1. The algorithm does not work for the overlapping 
obstacles. 

2. The algorithm does not work for obstacles other than 
rectangular shape. 



IX. Conclusions and future work: 

This implementation is very simple to understand and easy 
to use, since it uses conventional data structures like struct, 
arrays, procedural functions and procedural code. In this 
project, finding a solution to the problems was more 
important than the running time. We believe that the 
implementation and methods in our work can help the 
design of routing tools in the future. The code can be used to 
reduce wire length of the tree without hitting the obstacles. 
The running time of this algorithm is 0(n2), so a more 
optimal algorithm can be developed for the Obstacle- 
Avoiding Rectilinear Minimum Spanning Tree having better 
running time as well as shorter wire length. 
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Algorithmic Approach for Creating and Exploiting 

Flexibility in Steiner Trees 
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Abstract - Routing is an important task in VLSI design and the 
rectilinear Steiner minimal tree (RSMT) construction is a 
fundamental research issue in the context of routing. Given a 
set of terminals, the RSMT problem is to find a rectilinear 
minimum spanning tree (RMST) that connects all the 
terminals, possibly through some additional points (called 
Steiner points) with minimal length. In practice, rectilinear 
Steiner trees are used to route signal nets by global and detail 
routers. Steiner tree problem is not just only routing problems 
in Computer networks it can also be used in designing proper 
road, airway routes. The concept of minimization of Steiner 
trees have practical applications in field of VLSI Routing, Wire 
length estimation, as all required minimization of intersections. 
Minimization of intersection can be achieved by creating and 
Exploiting flexibility in Steiner trees. But producing Flexibility 
in RST can produce such set with minimum number of 
intersections. The new, flexible tree is guaranteed to have the 
same total length. Any existing Steiner tree algorithm can be 
used for the initial construction of the Steiner tree. While 
solving for the flexibility in Steiner tree, problems like dealing 
with the overlaps have to be tackled and maximizing the 
flexibility has to done. 



segments, whose enclosing boxes intersect or overlap, 
except touching at a common end point (if any) of the two 
segments. No matter how we reroute a L-shaped segment, in 
a stable RST, within its enclosing box, no overlaps or 
crossings will occur, and the RST will essentially remain 
unchanged. A stable RST corresponds to a local minimum 
under the rerouting operation. 

An RST is stable if there is no pair of edges such that their 
bounding boxes intersect or overlap except at a common 
endpoint (if any) of the two edges. Equivalently, a stable 
RST will not have overlaps when the edges are routed with 
minimum length. 




Bounding box 
of edge e 



o 



6 



o 



i 



e 



bounding box of edge e 



I. Introduction to the Problem 

T he problem of creation and exploiting flexibility in 
Steiner Trees can be stated by the following steps: 

Stepl: User defined inputs are taken from input files which 
contains edge sets and Steiner nodes. The input files are 
generated from some pre-defined module for generation of 
Steiner tree. 

Step2: Using unstable to stable rectilinear Steiner tree 
generation algorithm a stable Steiner tree is generated. 

Step3: Algorithm to obtain parallel edge, flexible edge, and 
movable edge are applied to obtain the required. 

Step4: The type of overlap is found, whether it is type 1 or 
type 2 overlap. 

Step5: Finally Generate Steiner tree algorithm is used to 
create flexibility in Steiner tree. 

II. Algorithms 

1 ) Unstable Steiner tree to stable Steiner tree 

generation. 

The major condition which is required for applying 
conditions of flexibility is that the rectilinear Steiner tree has 
to be stable. A RST is said to be stable under rerouting, if 
there is no pair of degenerate or non-degenerate L-shaped 



Unstable RST Stable RST 

Pseudocode for generation of Stable Steiner tree: 

Only when there are no degenerate segments (not edges) 

Such that there bounding boxes overlap: 

For (each segment (not edge)) 

{ 

//let the segment be nln2 

//where nl and n2 denote the nodes of the segment considered 
If (the segment is neither vertical nor 

horizontal) (1) 

{ 

Declare two temporary nodes Nl’ and N2’ such that 

N 1 ’ .x=n 1 .x,N 1 ’ .y=n2.y 

N2’.x=n2.x,N2\y=nl.y 

//make the set of all segments as A which are adjacent to segment 
under consideration 

//let the segments belonging to the set be denoted by n3n4 
// (the nodes n3 and n4 are variables ) 

If (((n3.x=<nl.x=<n4.x)&&(n3.y=<nl.y=<n4.y))ll((similarly for 
n2))||((Nl’))||((N2’)) 

{ 

//There exist overlapping 
If (segment n3n4 is L-shaped) 

{ 

Flip n3n4 

} 
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If (segment n3n4 is not L-shaped and segment nln2 is L shaped) 

{ 

Flip nln2 
1 
1 
1 

If (segment is either vertical or horizontal) 

{ 

//The above process remains the same 
//Only the number of comparisons reduces to 2 
If (n3n4 is vertical or horizontal) 

{ 

No overlapping 

} 

If (n3n4 is L-shaped) 

{ 

Then perform the steps in (1), but the number of comparisons 
would reduce to 2 
} 

} 

} 

2 ) Algorithm for getting movable, parallel and flexible 

edge. 

Flexible edge: Flexible edges can be generated by moving 
the movable edges in RST. 

Movable edge: These are special edges with following 
properties: 

• Steiner-to- Steiner edge. 

• Edge degree of each Steiner point is 3. 

• Parallel edges exist at both ends. 

• Flexible candidate exists at least at one end. 

To get the movable edges we try to locate edge that is in 
between two Steiner points, if such edge is found than we 
try getting information whether it contains parallel edges, if 
it does and also contains either one or two flexible edge than 
that edge is considered to be a movable edge. To get a 
flexible edge we try to locate a movable edge, if movable 
edge is horizontal, we try to find the adjacent horizontal 
edge. If such an edge exist than that edge is considered to be 
a flexible edge. 

To get a set of parallel edges we locate edges that are 
perpendicular to the movable edges and also pass through 
their pair of Steiner points. 

Pseudocode for finding parallel, movable and flexible edges: 

Checking for Flexibility and Movability 
For (each segment) 

{ 

CHECK FOR _MOVABILTY 

{ 

If (the segment is between two Steiner points) 

{ 

If (the Steiner edge is horizontal) 

{ 

Check for si; 

{ 

If (there exist only one edge node nl such that nl.y=sl.y) 

{ 

One of the parallel edges is nisi; 

} 

If (there exist two edge nodes nl and n2 such that nl.y=n2.y=s2.y) 

{ 



One of the parallel edge is nln2 ; 

} 

If (there is no single edge node with any above two) 

{ 

There is no parallel edge, hence continue; //go to 1 
//if a parallel edge is found let it be S 1 
} 
i 

Check for s2; 

{ 

Similarly check for single or double nodes for s2 

//if a parallel edge through s2 is found let it be S2 

//let the point of the parallel edge with lower y co-ordinate be 

denoted by S.l 

//and the one with higher y coordinate by S.h 

} 

If (parallel edge is found through both si and s2 ) 

{ 

If ((Sl.l! =S2.h) II (Sl.h != S2.1)) 

{ 

The edge set is movable 

So the movable set can be created; 

} 

} 

} 

If (the Steiner edge is vertical) 

{ 

The process remains the same 

But here the x co-ordinate has to be compared 

And for verifying the parallel edges use the follow; 

//let the point of the parallel edges with lower x coordinate be S.l 
//the one with the higher x coordinate be S.h 
} 

} 

CHECK FOR FLEXIBILTY 

{ 

If both the Steiner points of the considered movable edge 

Are not T-points simultaneously, then the considered movable edge 

is 

Movable and flexible 

Hence the above edge and its adjacent edges can be entered into 
The set of movable and flexible edges 
} 

3 ) Algorithm for generating flexible Steiner tree. 

Once we have obtained a stable minimum rectilinear Steiner 
tree, along with parallel, movable, and flexible edges than 
we can apply Algorithm Generate flexible tree pseudo code, 
to generate flexibility in Steiner tree. For exploiting the 
Flexibility of Steiner tree we need flexible edges, movable 
edges, and parallel edges. 

Problem Formulation: 

• Given a stable Rectilinear Steiner Tree 

• Maximize the flexibility of the RST 

• Subject to 

a Topology remains unchanged (and thus if we do min- 
length edge connection, total length remains unchanged) 
a No initial flexible edge is degraded in flexibility 

Pseudocode for generating of flexible Steiner tree : 

For Each edge e 

{ 

If e and its adjacent edges are a movable set 

{ 

Create Movable Set 
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Check Overlap Steiner tree. The following figure shows example for 

} generation of flexible Steiner tree when input is unstable 

For each movable set M Steiner tree 

If M has no overlap 

{ 

Move edge M 

} 

Move Overlapped edges 

} 



Suppose we start from unstable Steiner tree, we apply 
algorithm to convert it into stable Steiner tree. Than we 
apply algorithm to get movable edge, flexible edge and 
parallel edge. One we are finished with these parts we apply 
Generate flexible tree algorithm to produce flexibility in 




UNSTABLE RST 




STAELE RST 



Matab'e 




Flexibility function: Suppose we have a flexible edge as 
shown in the figure, w and 1 shows its dimensions than, 

— # 

W 



L 



There are two general functions that can be used to compute 
the flexibility. 

1. fl = w + L 

2. f2 = w.L 

As we see in case 1 that fl = w+L this shows the wire 
length, (fl) will give us that change in the wire length. 
Similarly f2 = w.L gives us the area bounded by the 
rectangle formed by W and L. So (f2) will give the change in 
area. 



4) Algorithm of finding out type of overlap that exists . 

The two types of overlap that exist are: 

• Overlap Type 1 It occurs when a parallel edge of two 
movable sets is the same and inequalities mentioned in 
intersection one hold. 

• Overlap Type 2 occurs when the flexible edge of a 
movable segment is a parallel edge of the other moving set. 

Pseudocode for finding Type 1 overlap: 

For each movable edges ml 
{ //pi and p2 be parallel edges 
for each movable edge m2 !=ml 
{ 

if (ml.pl~m2.pl or ml .p2=m2.p2) 
return “overlap Typel” 

} 

} 
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Pseudocode for finding Type2 overlap: 

For each movable edges ml 

{ //pi and p2 be parallel edges; // fl and f2 are flexible edges 
for each movable edge m2 !=ml 
{ 

if (ml.fl~m2.pl or m 1 .f2=m2.p2) 
return “overlap Type2” 

} 

} 




Flexible edge and movable 
edge is same. 



Fig: Flexible edge of one set is same as movable edge 
of other 



5) Algorithm of handling of overlap: 

Generating flexibility in a given stable RST is the final part 
of the algorithm. The need is to maximize the flexibility 
function. As there are only two types of overlaps that can 
exist in a RST, hence increment in flexibility is achieved 
only by solving the overlaps. When there is no overlapping, 
then it becomes a simple case of moving the movable edge 
to the maximum. In both types of overlap, we need to 
measure the flexibility in order to decide on moving the 
movable segments. The behavior of the overlap will depend 
on the flexibility function. If the movable sets overlap with 
each other, the maximum of flexibility function can not be 
obtained by maximizing the flexibility function for each 
movable set. Therefore, dealing with overlaps depends on 
the definition of flexibility function. Here we discuss 
mathematical formulation of flexibility function for overlaps 
of type I and II. If we take the flexibility functions as 
follows: 
g(x,y)=X*Y 



then the overall flexibility G(x,y) is expressed in terms of 
flexibility function of eachof the movable sets. Hence, the 
following expression gives the overall flexibility of a RST. 
G(x,y)= g(x,y) 

Here, we derive equations for solving the overlaps of type I 
and II. 

In the Fig. 5.1, we have a pair of movable edges exhibiting 
overlap of I kind.yl and y2 are the distance moved by edges 
si and s2 respectively so as to maximize the flexibility. 
G(x,y)=wlyl + w2y2 

Maximizing the above function results in values of yl and y2 
for which the G(x,y) is maximum subject to certain 
constraints. In the Fig. 5, there are two movable edges in a 
single set. The two edges do not have the same direction of 
motion, so mathematical formulation becomes tedious as the 
number of movable edges increase above two in a given set. 
The flexibility of the RST in the Fig. 5.2 can be given by: 

G(x,y)=(Xl+ X2)y +Y3x - xy 

As the function is dependent on two variables, the solution 
for maxima yields the maximum flexibility. When the 
number of overlaps with a single movable edge becomes 
more than three, it is called a chain. As shown in the figure 
below, the formulation of the mathematical equation 
requires involvement of more than three variables, 
derivation of which is beyond the scope of this project. 




III. Experimental Results 

We executed our C program in Dev C++ 4. 9. 9. 2 on a PC- 
based machine with 3 GHz Pentium processor and 2GB 
RAM under Windows Vista Operating system. The two 
inputs - edges of Steiner tree and Steiner points are taken 
from files edge.txt and spoint.txt. These two files contain 
input taken from some other module. For each set of input, 
we have taken two sets of output. The results are shown in 
the following table: 
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S.NO 



1 . 



2 . 



3. 



4. 



5. 



Table 1: Experimental Results 



Input Edges. 


Input 

Steiner 

Nodes. 


Type 1 
Overlap. 


Type 2 
Overlap. 


Time of run 
(Sec) 


10,10-20,10 

20.10- 30,10 

20.10- 20,30 

30.10- 30,30 

30.10- 50,10 


20,10 

30,10 


0 


0 


0.07 sec 


10,10-20,10 

20.10- 30,10 

20.10- 20,30 

30.10- 30,30 

30.10- 50,10 


20,10 

30,10 


0 


0 


0.066 sec 


60,50-60,60 


60,60 


0 


0 




60,60-60,80 

60,60-80,60 

80.60- 80,70 
80,70-76,70 

80.60- 90,60 
60,50-60,60 


80,60 






0.088 sec 


60,60-60,80 

60,60-80,60 

80.60- 80,70 
80,70-76,70 

80.60- 90,60 
90,40-90,60 


60,60 

80,60 


0 


0 


0.086 sec 




10.50- 60,50 

60.50- 60,160 

60.160- 60,190 

60.160- 110,160 


60,160 


0 


0 


0.07 sec 


110,160-110,101 

10,160-110,230 


110,160 








10.50- 60,50 

60.50- 60,160 
60,160-60,190 


60,160 


0 


0 




60,160-110,160 


110,160 


0.069 sec 



110,160-110,101 

10,160-110,230 

10,150-40,150 



40.150- 40,110 

40.150- 110,150 

110.150- 110,200 

110.150- 110,80 

110.80- 110,50 

110.80- 210,80 

210.80- 210,130 

210.80- 250,80 



40.150 

110.150 
110,80 
210,80 



Between 
40,150 
-110,150 & 
110,80-210,80 



0.107 sec 



7. 



0 
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10.150- 40,150 

40.150- 40,110 

40.150- 110,150 

110.150- 110,200 


40.150 

110.150 
110,80 
210,80 


Between 








110,150-110,80 


40,150 




0.109 sec 


8. 


110,80-110,50 


-1 10,150 & 


0 






110,80-210,80 

210.80- 210,130 

210.80- 250,80 


110,80-210,80 








9. 


0, 0-1,0 
1, 0-3,0 


U 


0 


Between 






3, 0-4,0 


1,0 




1, 0-3,0 & 


0.84 sec 




3.0- 3, 1 

1.0- 1, 1 
1, 1-2,1 
1, 1-1,2 


3,0 




1,1-1, 0 






0, 0-1,0 
1, 0-3,0 










10. 


3. 0- 4,0 

3.0- 3, 1 

1.0- 1, 1 
1, 1-2,1 
1, 1-1,2 


1,1 

1,0 

3,0 


0 


Between 

1. 0- 3,0 & 

1. 1- 1,0 


0.99 sec 




1,0 -2,0 
2,0 -2,2 










11. 


2,2 - 2,0 
2,2 - 2,3 

2.0 - 3,0 

3.0 - 3,1 

3.0 - 4,0 


2,2 

2,0 

3,0 


0 


Between 
2, 2-2,0 & 
2, 0-3,0 


0.125 sec 




1,0 -2,0 
2,0 -2,2 
2,2 - 2,0 
2,2 - 2,3 


2,2 




Between 
2, 2-2,0 & 
2, 0-3,0 


0.098 sec 


12. 


2.0 - 3,0 

3.0 - 3,1 

3.0 - 4,0 


2,0 

3,0 


0 








0,2 -1,2 
1,2 -1,1 












1.2 - 3,2 

3.2 - 3,3 
3,2 - 3,0 
2,0 -3,0 






Between 




13. 


1,2 

3,2 

3.0 

4.0 


Between 
1,2-3, 2 & 


1.2- 3, 2 & 

3. 2- 3-0 


0.164 sec 




3.0 - 4,0 

4.0 - 4,1 

4.0 - 5,0 


3, 0-4,0 


3,2-3,0& 
3, 0-4,0 
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14 . 



0,2 - 1,2 


1,2 


Between 


Between 


1,2 — 1,1 


3,2 


1,2-3, 2 & 


1,2-3, 2 & 


1,2 - 3,2 


3,0 


3, 0-4,0 


3, 2-3-0 


3,2 -3,3 


4,0 






3,2 -3,0 






3,2-3,0& 


2,0 -3,0 






3, 0-4,0 


3,0 -4,0 








4,0 -4,1 








4,0 -5,0 









IV. Advantages of Project 

1. This project can be used to make a Steiner tree flexible, 
so that it can reduce number of intersections if there are 
more than one Steiner trees are used in a grid. 

2. If number of intersections can be reduced my using 
concept of flexibility in Steiner tree, than number of layers 
may also get decreased. 

3. As the effective topology remain same and number of 
intersections decreases leading to decrease in the layering so 
the effective cost of VLSI chip may also decrease. 

V. Criticism 

If more than two overlapping exist in the project than there 
might be a change that software may not work. 

VI. Conclusions and future work: 

This implementation is very simple to understand and easy 
to use, since it uses conventional data structures like struct, 
arrays, procedural functions and procedural code. 
Experimental results show our method can work well for the 



defined problem. In this project, finding a solution to the 
problems was more important than the running time. We 
believe that the implementation and methods in our work 
can help the design of routing tools in the future. The code 
can be used to reduce number of intersections in multilayer 
environment. 

The running time of the implemented algorithm is 0(n3) , in 
the worst case, and further improvement on the running time 
is possible by applying better programming techniques. 
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Abstract- Estimation models in software engineering are 
used to predict some important attributes of future entities 
such as development effort, software reliability and 
programmers productivity. Among these models, those 
estimating software effort have motivated considerable 
research in recent years [COSMIC, (2000)].In this paper we 
have discussed an available work on the effort estimation 
methods and also proposed a hybrid method for effort 
estimation process [Briand et.al, (1998)].As an initial approach 
to hybrid technology, we have developed a simple approach to 
SEE based on Use Case Models: The “Use Case Points 
Method.” [Briand et.al, (1998)]. This method is not new, but 
has not become popular although it is easy to understand and 
implement. We have therefore investigated this promising 
method, which is inspired by Function Points Analysis 
[Albrecht, (1994)]. Reliable estimates can be calculated by 
using our method in a short time with the aid of a spreadsheet 
but we are planning to extend its applicability to estimate risk 
and benchmarking measures [Briand et.al, (1998)][Sentas et.al, 
(2005)]. 

Keywords : Effort Estimation; Cost Refinement; Function 

Points; Use Case Points; Risk Assessment; Hybrid Method; 

Benchmarking. 



I. Introduction 

T he planning, monitoring and control of software 
development projects require that effort and costs be 
adequately estimated. However, some forty years after the 
term “software engineering” was coined [Jorgenson and 
Shepperd,(2007)], effort estimation still remains a challenge 
for practitioners and researchers alike. There is a large body 
of literature on software effort estimation models and 
techniques in which a discussion on the relationship 
between software size and effort as a primary predictor has 
been included [Albrecht, (1994)] [Albrecht and 
Gaffney, (1983)] [Abts and Chulani,(2000)] [Boehm, (1981)] 
[Anda et.al, (2001)] [Arnold and Pedross,(1998)] [Basili and 
Freburger ,(1998)]. They conclude that the models, which 
are being used by different groups and in different domains, 
have still not gained universal acceptance [Guruschke and 
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Jorgensen ,(2005)]. As the role of software in the society 
becomes larger and more important, it becomes necessary to 
develop a package which is used to estimate effort within a 
short period. In order to achieve this goal, the entire 
software development processes should be managed by an 
effective model. So, our proposed model will be focusing on 
three basic parameters. 1. Software effort estimation 2. 
Benchmarking 3. Risk Assessment. So far, several models 
and techniques have been proposed and developed [Boehm 
and Royce,( 1992)] [Anda et.al, (2001)] [Symons,(1991)] and 
most of them include “Software Size” as an important 
parameter. The below graph shows the application of 
software engineering principles and standards in medium 
sized organizations. 
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Fig: Reference: The Application of software 
engineering standards in very small enterprises, Vol3, 
issue 4 

The Use Case Model can be used to predict the size of the 
future software system at an early development stage to 
estimate the effort in the early phase of software 
development; 

Use case point method has been proposed [Smith,(1991)]. 
Use Case Point Method is influenced by the Function Points 
Methods and is based on analogous Use Case Point [Smith, 
(1991)]. 
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We have been involved in the activity of developing a 
hybrid model to estimate the effort in the early phase of 
software engineering development [Briand et.al,(1998)]. 
This paper describes the method of introducing Use Case 
Points method to software projects for estimating effort. The 
paper also describes the automatic classification of actors 
and use cases in the UCP model rather than doing it 
manually. The result of this paper will be taken as a base for 
developing a hybrid method which will be used for bench 
marking and risk assessment [Sentas et.al,(2005)]. 



II. Problem Framework 

Our understanding of the effort-estimation problem arises 
from the idea that any software project is the result of a set 
of business goals that emerge from a desire to exploit a 
niche in the marketplace with a new software product. Take, 
for example, the development of an application server that 
caters to on-demand software. The business goals of having 
a robust, high-performance, secure server lead to a set of 
architectural decisions whose goal is to realize specific 
quality-attribute requirements of the system (e.g., using tri- 
modular redundancy to satisfy the availability requirements, 
a dynamic load-balancing mechanism to meet the 
performance requirements, and a 256-bit encryption scheme 
to satisfy the security requirements). Each architecture A 
that results from a set { Ai} of architectural decisions has a 
different set of costs C{Ai}(Fig. 2). The choice of a 
particular set of architectural decisions maps to system 
qualities that can be described in terms of a particular set of 
stimulus/response characteristics of the system {Qi}, i.e., Ai 
-> Qi. (For example, the choice of using concurrent 
pipelines for servicing requests in this system leads to a 
predicted worst-case latency of 500 ms, given a specific rate 
of server requests.) The “value” of any particular 
stimulus/response characteristic chosen is the revenue that 
could be earned by the product in the marketplace owing to 
that characteristic. We believe that the software architect 
should attempt to maximize the difference between the 




Fig: Business goals drive the architectural decisions {Ai}, 
which determine the quality attributes {Qi}. Value (Va) 
depends on Qi and Cost(C) depends on Ai. 



III. Related Work 

Until today, several researches [Boehm et.al,(2001)] 
[Boehm et.al,(1995)] and case studies have been reported 
about the Use Case point and effort estimation based on Use 
Case Model [COSMIC, (2000)]. Smith proposed a method to 
estimate Line of Code from Use Case Diagram 
[Smith,(1999)] [Aggarwal et.al,(2005)]. Arnold and Pedross 
reported the Use Case Method can be used to estimate the 
size of the software [Arnold and Pedross, (1998)]. They also 
suggested that Use Case Point Method should be used with 
other estimation method to get the optimum result. 



IV. Limitations Of Function Points 

Function Point is a measure of software size that logically 
measures the functional terms and the measured size stays 
constant irrespective of the programming language and 
environments used [IFPUG,(2002)]. In Function Point, it is 
very much essential to use the detailed information about the 
software. Such detailed information will be available in 
software design specification. Function Point metric 
evaluation is difficult to estimate for software which has 
short development time [Hajri et.al,(2005)]. So, in reality 
estimation of software at the earlier phase of the 
development life cycle process will certainly reduces risk. 
To estimate the effort accurately in the earlier phase of the 
development life cycle process, Use Case Point Method has 
been proposed [Smith ,(1999)]. 



V. Use Case Point Method 

This section briefly explains the procedure how Use case 
point has been implemented in our model [Smith,(1999)]. 

A. Use case point method 

The first and the foremost step is to calculate Use Case Point 
(UCP) from Use Case Model [Smith,(1999)]. The Use Case 
Model mainly consists of two documents, system or sub 
system documents & use case documents contains the 
following description of items: system name, risk factors, 
system - level use case diagram,, architecture diagram, 
subsystem descriptions, use case name, brief description, 
context diagram, preconditions, flow of events, post 
conditions, subordinate use case diagrams, subordinate use 
cases, activity diagram, view of participating classes, 
sequence diagrams, user interface, business rules, special 
requirements & other artifacts [Schneider and 
Winters, (2001)]. 

From the above specified information we are going to focus 
mainly on two parameters system - level use case diagram 
and flow of events. System - level use case diagram 
includes one or more use case diagrams showing all the use 
cases and actors in the system [Schneider and 
Winters,(2001)]. Flow of events includes a section for the 
normal path and each alternative path in each use case. 
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Figure 2 shows a part of flow of events of the use case 
“SESSION” in Figure 1. 




Fig: An example of System Level use case diagram for 
ATM System 

A session is started when a customer inserts an ATM card 
into the card reader slot of the machine. 

The ATM pulls the card into the machine and reads it. 

If the reader cannot read the card due to improper insertion 
or damaged stripe, the card is ejected, an error screen is 
displayed, and the screen is aborted. 

The customer is asked to enter his/her PIN, and is then 
allowed to perform one or more transactions, choosing from 
a menu of possible types of transaction in each case. 

Fig: Flow of Events (Session Use Case) 

B. Counting use case point: 

Intuitively, UCP is measured by counting the number of 
actors and transactions included in the flow of events with 
some weight. A transaction is an event that occurs between 
an actor and the target system, the event being performed 
entirely or not at all. But, in our method the effort estimation 
is calculated by applying the following procedure 



a) Counting actor’s weight 



The actors in the use case are categorized as simple, average 
or complex. A simple actor represents another system with a 
defined API. An average actor is either another system that 
interacts through a protocol such as TCP/IP or it is a person 
interacting through a text based interface. A complex actor 
is a person interacting through a GUI interface. 



Type 


Description 


Factor 


Simple 


Program Interface 


1 



Average 


Interactive, or 

Protocol Driver 


2 


Complex 


Graphical User 

Interface 


3 



Table - 1 



The number of each actor type that the target software 
includes is calculated and then each number is multiplied by 
a weighting factor shown in TABLE - 1. Finally, actor’s 
weight is calculated by adding those values together. 

b) Counting use case weights 

Each Use case should be categorized into simple, average or 
complex based on the number of transactions including the 
alternative paths. A simple use case has 3 or fewer 
transactions, an average use case has 4 to 7 transactions and 
a complex use case has more than 7 transactions. 

Then, the number of each use case type is counted in the 
target software and then each number is multiplied by a 
weighting factor shown in Table - 2. 



Type 


Description 


Factor 


Simple 


3 or fewer 
transactions 


5 


Average 


4 to 7 

transactions 


10 


Complex 


More than 7 
transactions 


15 



Table- 2. Transaction Based Weighting Factors 
Finally, use case weight is calculated by adding these values 
together. 

c) Calculating unadjusted use case points 

It is calculated by adding the total weight for actors to the 
total for use cases. 



Factor 


Description 


Weight 


Ti 


Distributed System 


3 


t 2 


Response or Throughput 

Performance Objectives 


4 


t 3 


End - User Efficiency (online) 


5 


t 4 


Complex Internal Processing 


2 


t 5 


Code must be readable 


3 


t 6 


Easy to install 


5 


t 7 


Easy to use 


5 
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t 8 


Portable 


2 


t 9 


Easy to Change 


5 


Tic 


Concurrent 


1 


Tn 


Includes special security 

features 


4 


Tl2 


Provides direct access for third 
parties 


2 


Tl3 


User training facilities required 


2 



Table - 3 



d) Weighting technical and environmental factors 
The UUCP are adjusted based on the values assigned to a 
number of technical and environmental factors shown in 
Tables 3 & 4. 



Factor 


Description 


Weight 


Fi 


Familiar with the 
Rational Unified 

Process 


4 


f 2 


Application 

Experience 


3 


f 3 


Object - Oriented 
Experience 


2 


f 4 


Lead Analyst 

Capability 


3 


f 5 


Motivation 


5 


Fe 


Stable Requirements 


4 


f 7 


Part - Time Workers 


3 


f 8 


Difficult 

Programming 

Language 


3 



Table - 4 



Method: 

Each factor is assigned a value between 0 and 5 depending 
on its assumed influence on the project. A rating of 0 means 
the factor is irrelevant for this project and 5 means it is 
essential. 



Calculation of TCF: 

It is calculated by multiplying the value of each factor (T i - 
T 13 ) in Table 3 by its weight and then adding all these 



numbers to get the sum called the T Factor. Finally, the 
following formula is applied: 



TCF = 0.6 + ( 0.01 * T Factor ) 



Calculation of environmental factor: 

It is calculated accordingly by multiplying the value of each 
factor ( Fi - F 8 ) in TABFE - 4 by its weight and adding all 
the products to get the sum called the E Factor. Finally, the 
following formula is applied: 



EF = 1.4 * ( - 0.03 * E Factor) 



Calculating UCP 

Use Case Point (Adjusted) is calculated by 



UCP = UUCP * TCF * EF 



(3) 




Figure 3: Calculating Use Case Point 



Estimating effort: 

By multiplying the specific value (man - hours) by the UCP, 
the effort can be easily calculated. In [Smith,(1999)j, a 




Global Journal of Computer Science and Technology 



Page | 38 



factor of 20 man - hours per UCP for a project is suggested. We are planning to initially start the automated tool with a 

The entire procedure is diagrammatically shown above. minimal set of keywords. As on later stages, the new 

keywords will be updated automatically and can be used for 
Research Method l ater projects. 



Based on the proposed method, we have planned to develop 
a framework [Alistair, (2000)] as an automated tool under 
the name [Hybrid Tool]. The input is a XMI File. The tool is 
implemented in JAVA and Xerces 2 Java Parser is used to 
anlayze the model file [OMG,(2005)]. 



VI. An Automated Tool For Estimating Use Case 

Point 

(1) Overview 

In order to effectively introduce Use Case Point Method to 
the software development, we have decided to create a Use 
Case Point measurement tool [Smith, (1999)]. There were 
several existing tools available which is based on Use Case 
Model but in all these existing models, it is necessary to 
judge the complexity of actors and Use cases by manually. 
The judgment is the most important part in software cost 
estimation so we have decided to create an automated tool. 
So, in order to obtain the entire procedure described in 
section 5 automatically, it is mandatory to describe a set of 
rules to classify the weight for actor and use case in section 
5.2. 

Also, it is necessary to write the Use - Case Model in 
machine - readable format. So, we assume that the use case 
model is written in XMI [XML Metadata Interchange] 
[OMG,(2005)]. The reason for choosing this type of file 
format is because most case tools for writing UML diagrams 
support to export them as XMI files [OMG,(2005)]. 

(2) Rules for weighting actors 

As described in section 5.2, weight for each action is 
determined by the interface between actor and the target 
software. But, the interface information will not be available 
in the actor description. Only the name of the actor will be 
available. So, it is very much essential to create a protocol 
which determines the complexity of actor. 



Classification based on actor’s name 

At the initial stage of the classification we are going to 
determine whether the actor is a person or an external 
system based on the name of the actor. That is, beforehand, 
we prepare the list of keywords which can be included in the 
name of the software system. 

FOR EXAMPLE the keywords “system” and “server” are 
used in the system’s name. 

Keywords for step 1 (KL a ) : System, Server, Application, Tool. 



Classification based on keywords included in use case 

Here, we are going to classify based upon on the flow of 
events to which the actor is relevant. As an initial stage, we 
are planning to develop a three set of keywords to each 
complexity factor of actor and then, we will try to extract all 
words included in the flow of events and then match them 
with each keyword in the lists. Finally, the actor’s weight is 
assigned as the complexity for the keyword list that is most 
fitted to the words in the flow of events. 

Keywords for simple actor (KL sa ) : Request, Send, Inform. 



Keywords for Complex actor (KL ca ) : Press, Push, Select, Show, GUI, 
Window 



Keywords for Average Actor (Person) (KL aap ) : Command, Text, I/P, 

CUI 



Classification based on experience data: 

Suppose, if we are unable to determine the actor’s weight at 
step2, we determine it based on the experience data. The 
experience data includes the information about the Use Case 
Model and the Use case Point developed in the past software 
projects. 

(3) Rules For Weighting Use Cases 

As described in section 5.2, the complexity of use case is 
determined by the number of transactions. So, we have 
decided to focus on the flow of events in the Use Case 
Model. The simplest way to count the transaction is to count 
the number of events. There are no standard procedures or 
protocols to write the flow of events and it is also quite 
possible that several transactions are described in one event. 
So, because of this limitation several guidelines to write 
events in use case model have been proposed [Schneider and 
Winters, (2001)]. There are ten guidelines to write a 
successful scenario. Among them, we focus on the 
following two guidelines. 



(Gi) ^ Use a Simple Grammar 
(G 2 ) 4 Include a reasonable set of actions. 



Jacobson suggests the following four pieces of compound 
interactions should be described. (4) 

The primary actor sends request and data to the system, 
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The system validates the request and the data, 

The system alters its internal state and 

The system responds to the actor with the result. 

So, based on the above said guidelines, we propose the way 
to analyze the events using the morphological analysis and 
syntactic analysis. Through these analyses, we can get the 
information of morpheme from the statement and 
dependency relation between words in the statement. We 
conduct the morphological analysis for all statements and 
get the information of the subject word and predicate word 
for each statement. 

Then, we apply the following rules: 



Rule U - 1 : 



We regard each set of the subject and predicate word as a candidate of a 
transaction 



Rule U - 2: 



Among the candidates, we identify the one that related to actor's 
operation and system response as a transaction 



For each use case, we have to apply the above said rules and 
based on these rules, we get the number of transactions. 
Then, based on the number of transactions we determine the 
complexity of each use - case. 




Figure 4: Automated Tool 



VII. Conclusion & Future Work 

This paper has proposed an automated Hybrid tool which 
calculates Use Case Points from Use Case Models in XMI 
files [OMG,(2005)]. We will use the effort estimation based 
on this Hybrid Tool in the hybrid technology proposed for 
Risk assessment and benchmarking. We will also extend this 
technique for developing an automated tool for assessing 
risk and effort. 
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